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Abstract 

When  monitoring  spatial  phenomena  with  wireless  sensor  networks,  selecting  the  best  sensor  placements  is  a  fundamental 
task.  Not  only  should  the  sensors  be  informative,  but  they  should  also  be  able  to  communicate  efficiently.  In  this  paper, 
we  present  a  data-driven  approach  that  addresses  the  three  central  aspects  of  this  problem:  measuring  the  predictive 
quality  of  a  set  of  hypothetical  sensor  locations,  predicting  the  communication  cost  involved  with  these  placements, 
and  designing  an  algorithm  with  provable  quality  guarantees  that  optimizes  the  NP-hard  tradeoff.  Specifically,  we  use 
data  from  a  pilot  deployment  to  build  non-parametric  probabilistic  models  called  Gaussian  Processes  (GPs)  both  for 
the  spatial  phenomena  of  interest  and  for  the  spatial  variability  of  link  qualities,  which  allows  us  to  estimate  predictive 
power  and  communication  cost  of  unsensed  locations.  Surprisingly,  uncertainty  in  the  representation  of  link  qualities 
plays  an  important  role  in  estimating  communication  costs.  Using  these  models,  we  present  a  novel,  polynomial-time, 
data-driven  algorithm,  PSPIEL,  which  selects  Sensor  Placements  at  Informative  and  Communication-Efficient  Locations. 
Our  approach  exploit  two  important  properties  of  this  problem:  submodularity,  formalizing  the  intuition  that  adding  a 
node  to  a  small  deployment  can  help  more  than  adding  it  to  a  large  deployment;  and  locality,  under  which  nodes  that  are 
far  from  each  other  provide  almost  independent  information.  Exploiting  these  properties,  we  prove  strong  approximation 
guarantees  for  our  approach.  We  also  show  how  our  placements  can  be  made  robust  against  changes  in  the  environment, 
and  how  PSPIEL  can  be  used  to  plan  informative  paths  for  exploration  using  mobile  robots.  We  provide  extensive 
experimental  validation  of  this  practical  approach  on  several  real-world  placement  problems,  and  built  a  complete  system 
implementation  on  46  Tmote  Sky  motes,  demonstrating  significant  advantages  over  existing  methods. 
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1  Introduction 


Networks  of  small,  wireless  sensors  arc  becoming  increasingly  popular  for  monitoring  spatial  phe¬ 
nomena,  such  as  the  temperature  distribution  in  a  building  (Deshpande  et  al.,  2004).  Since  only  a 
limited  number  of  sensors  can  be  placed,  it  is  important  to  deploy  them  at  most  informative  loca¬ 
tions.  Moreover,  the  use  of  wireless  communication  to  collect  data  leads  to  additional  challenges. 
Poor  link  qualities,  which  can  arise  if  sensors  are  too  far  apart,  or  due  to  obstacles  such  as  walls 
or  radiation  from  appliances,  can  cause  message  loss  and  hence  require  a  large  number  of  retrans¬ 
missions  in  order  to  collect  the  data  effectively.  Such  retransmissions  drastically  consume  battery 
power,  and  hence  decrease  the  overall  deployment  lifetime  of  the  sensor  network.  This  suggests 
that  communication  cost  is  a  fundamental  constraint  which  must  be  taken  into  account  when  plac¬ 
ing  wireless  sensors. 

Existing  work  on  sensor  placement  under  communication  constraints  (Gupta  et  al.,  2003;  Kar  and 
Banerjee,  2003;  Funke  et  al.,  2004)  has  considered  the  problem  mainly  from  a  geometric  perspec¬ 
tive:  Sensors  have  a  fixed  sensing  region,  such  as  a  disc  with  a  certain  radius,  and  can  only  com¬ 
municate  with  other  sensors  which  arc  at  most  a  specified  distance  apart.  These  assumptions  arc 
problematic  for  two  reasons.  Firstly,  the  notion  of  a  sensing  region  implies  that  sensors  can  per¬ 
fectly  observe  everything  within  the  region,  but  nothing  outside,  which  is  unrealistic:  e.g.,  the 
temperature  can  be  highly  correlated  in  some  areas  of  a  building  but  very  uncorrelated  in  others 
(■ c.f ,  Figure  2(a)).  Moreover,  sensor  readings  arc  usually  noisy,  and  one  wants  to  make  predictions 
utilizing  the  measurements  of  multiple  sensors,  making  it  unrealistic  to  assume  that  a  single  sensor 
is  entirely  responsible  for  a  given  sensing  region.  Secondly,  the  assumption  that  two  sensors  at 
fixed  locations  can  either  perfectly  communicate  (i.e.,  they  arc  “connected”)  or  not  communicate 
at  all  (and  arc  “disconnected”)  is  unreasonable,  as  it  does  not  take  into  account  variabilities  in  the 
link  quality  due  to  moving  obstacles  (e.g.,  doors),  interference  with  other  radio  transmissions,  and 
packet  loss  due  to  reflections  (Cerpa  et  al.,  2005).  Figure  1(b)  shows  link  quality  estimates  (package 
transmission  probabilities)  between  a  fixed  sensor  location  (sensor  41)  and  other  sensor  locations  in 
the  sensor  network  deployment  at  Intel  Research.  Berkeley,  as  shown  in  Figure  1(a). 

In  order  to  avoid  the  sensing  region  assumption,  previous  work  (c.f,  Cressie,  1991)  established 
probabilistic  models  as  an  appropriate  framework  for  predicting  sensing  quality  by  modeling  cor¬ 
relation  between  sensor  locations.  Krause  et  al.  (2007)  present  a  method  for  selecting  informative 
sensor  placements  based  on  the  mutual  information  criterion.  They  show  that  this  criterion,  origi¬ 
nally  proposed  by  Caselton  and  Zidek  (1984),  leads  to  intuitive  placements  with  superior  prediction 
accuracy  when  compared  to  existing  methods.  Furthermore,  they  provide  an  efficient  algorithm  for 
computing  near-optimal  placements  with  strong  theoretical  performance  guarantees.  However,  this 
algorithm  does  not  take  communication  costs  into  account. 

In  this  paper,  we  address  the  general  (and  much  harder)  problem  of  selecting  sensor  placements 
that  arc  simultaneously  informative,  and  achieve  low  communication  cost.  Note  that  this  problem 
cannot  be  solved  merely  by  first  finding  the  most  informative  locations,  and  then  connecting  them 
up  with  the  least  cost — indeed,  it  is  easy  to  construct  examples  where  such  a  two-phase  strategy 
performs  very  poorly.  We  also  avoid  the  connectedness  assumption  (sensors  arc  “connected”  iff 
they  can  perfectly  communicate):  In  this  paper,  we  use  the  expected  number  of  retransmissions  (De 
Couto  et  al.,  2003)  as  a  cost  metric  on  the  communication  between  two  sensors.  This  cost  metric 
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(a)  Example  placement 


(b)  Real  link  quality  -  node  41 
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(c)  GP  link  quality  —  node  41 


(d)  Link  quality  var.  -  node  41 


Figure  1 :  (a)  Indoor  deployment  of  54  nodes  and  an  example  placement  of  six  sensors  (squares) 
and  three  relay  nodes  (diamonds);  (b)  measured  transmission  link  qualities  for  node  41;  (c)  GP  fit 
of  link  quality  for  node  41  and  (d)  shows  variance  of  this  GP  estimate. 


directly  translates  to  the  deployment  lifetime  of  the  wireless  sensor  network.  We  propose  to  use  the 
probabilistic  framework  of  Gaussian  Processes  (Rasmussen  and  Williams,  2006)  not  only  to  model 
the  monitored  phenomena,  but  also  to  predict  communication  costs. 

Balancing  informativeness  of  sensor  placements  with  the  need  to  communicate  efficiently  can  be 
formalized  as  a  novel  discrete  optimization  problem.  We  present  a  novel  algorithm  for  this  place¬ 
ment  problem  in  wireless  sensor  networks;  the  algorithm  selects  sensor  placements  achieving  a 
specified  amount  of  certainty,  with  approximately  minimal  communication  cost.  Our  algorithm 
centers  around  a  new  technique  that  we  call  the  modular  approximation  graph.  In  addition  to  al¬ 
lowing  us  to  obtain  sensor  placements  with  efficient  communication,  we  show  how  this  technique 
can  be  generalized,  e.g.,  to  plan  informative  paths  for  robots. 

When  using  probabilistic  models  for  sensor  placement,  it  is  possible  that  an  optimized  sensor  place¬ 
ment  can  become  uninformative  if  the  environment  changes.  In  building  monitoring  for  example, 
building  usage  can  change,  leading  to  fluctuations  in  light  and  temperature  patterns.  To  address 
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(a)  Real  temperature  covariances  -  node  41 


(b)  GP  temperature  covariances  -  node  41 
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(c)  GP  prediction  of  temperature  surface 
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(d)  Variance  ofGP  temperature  surface 
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Figure  2:  (a)  Measured  temperature  covariance  between  node  41  and  other  nodes  in  the  deployment; 
(b)  predicted  covariance  using  non- stationary  GP;  (c)  predicted  temperatures  for  sensor  readings 
taken  at  noon  on  February  28th  2004,  and  (d)  shows  the  variance  of  this  prediction. 


this  challenge,  we  show  how  our  sensor  placements  can  be  made  robust  against  such  environmental 
changes. 

In  summary,  our  main  contributions  are: 

•  A  unified  method  for  learning  a  probabilistic  model  of  the  underlying  phenomenon  and  for  the 
expected  communication  cost  between  any  two  locations  from  a  small,  short-term  initial  de¬ 
ployment.  These  models,  based  on  Gaussian  Processes,  allow  us  to  avoid  strong  assumptions 
previously  made  in  the  literature. 

•  A  novel  and  efficient  algorithm  for  Sensor  Placements  at  Informative  and  cost-Effective  Lo¬ 
cations  (pSPIEL).  Exploiting  the  concept  of  submodularity,  this  algorithm  is  guaranteed  to 
provide  near-optimal  placements  for  this  hard  problem. 

•  An  extension  to  our  algorithm  that  allows  us  to  obtain  placements  that  are  robust  against 
changes  in  the  environment. 
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•  A  complete  solution  for  collecting  data,  learning  models,  optimizing  and  analyzing  sensor 
placements,  realized  on  Tmote  Sky  motes,  which  combines  all  our  proposed  methods. 

•  Extensive  evaluations  of  our  proposed  methods  on  temperature  and  light  prediction  tasks,  us¬ 
ing  data  from  real-world  sensor  network  deployments,  as  well  as  on  a  precipitation  prediction 
task  in  the  Pacific  Northwest. 


2  Problem  statement 


In  this  section,  we  briefly  introduce  the  two  fundamental  quantities  involved  in  optimizing  sensor 
placements.  A  sensor  placement  is  a  finite  subset  of  locations  A  from  a  ground  set  of  possible  sensor 
locations  V.  Any  possible  placement  is  assigned  a  sensing  quality  F(A)  >  0,  and  a  communication 
cost  c(A)  >  0,  where  the  functions  F  and  c  will  be  defined  presently.  We  will  use  a  temperature 
prediction  task  as  a  running  example:  In  this  example,  our  goal  is  to  deploy  a  network  of  wireless 
sensors  in  a  building  in  order  to  monitor  the  temperature  field,  e.g.,  to  actuate  the  air  conditioning 
or  heating  system.  Here,  the  sensing  quality  refers  to  our  temperature  prediction  accuracy,  and  the 
communication  cost  depends  on  how  efficiently  the  sensors  communicate  with  each  other.  More 
generally,  we  investigate  the  problem  of  solving  optimization  problems  of  the  form 

minc(A)  subject  to  F(A)  >  Q ,  (1) 

for  some  quota  Q  >  0,  which  denotes  the  required  amount  of  certainty  achieved  by  any  sensor 
placement.  This  optimization  problem  aims  at  finding  the  minimum  cost  placement  that  provides 
a  specified  amount  of  certainty  Q,  and  is  called  the  covering  problem.  We  also  address  the  dual 
problem  of  solving 

rnaxF(A)  subject  to  c(«4)  <  B ,  (2) 

for  some  budget  B  >  0.  This  optimization  problem  aims  at  finding  the  most  informative  placement 
subject  to  a  budget  on  the  communication  cost,  and  is  called  the  maximization  problem.  In  practice, 
one  would  often  want  to  specify  a  particular  location  s  £  V  that  must  be  contained  in  the  solution 
A.  This  requirement  arises,  for  example,  if  the  deployed  network  needs  to  be  connected  to  a  base 
station  that  is  positioned  at  a  fixed  location  s.  We  call  the  optimization  problem  (1)  (resp.  (2))  that 
includes  such  an  additional  constraint  a  rooted  covering  (resp.  rooted  maximization)  problem.  In 
this  paper,  we  present  efficient  approximation  algorithms  for  both  the  covering  and  maximization 
problems,  in  both  the  rooted  and  unrooted  formulations. 


2.1  What  is  sensing  quality? 

In  order  to  quantify  how  informative  a  sensor  placement  is,  we  have  to  establish  a  notion  of  un¬ 
certainty.  We  associate  a  random  variable  Xs  e  Xy  with  each  location  s  £  V  of  interest;  for  a 
subset  dCV,  let  Xp  denote  the  set  of  random  variables  associated  with  the  locations  A.  In  our 
temperature  measurement  example,  V  C  R2  describes  the  subset  of  coordinates  in  the  building 
where  sensors  can  be  placed.  Our  probabilistic  model  will  describe  a  joint  probability  distribution 
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P(Xp)  over  all  these  random  variables.  In  order  to  make  predictions  at  a  location  s,  we  will  con¬ 
sider  conditional  distributions  P(XS  =  xs  \  XA  =  xA),  where  we  condition  on  all  observations  xA 
made  by  all  sensors  A  in  our  placement.  To  illustrate  this  concept.  Figure  2(c)  shows  the  predicted 
temperature  field  given  the  measurements  of  the  54  sensors  we  deployed,  and  Figure  2(d)  shows  the 
variance  in  this  distribution. 

We  use  the  conditional  entropy  of  these  distributions, 

H(XS  |  XA)  =  -  P(xs,xA)logP(xs  |  xA)dxsdxA 

JXs,X-A 

to  assess  the  uncertainty  in  predicting  Xs.  Intuitively,  this  quantity  expresses  how  “peaked”  the 
conditional  distribution  of  Xs  given  XA  is  around  the  most  likely  value,  averaging  over  all  possi¬ 
ble  observations  XA  =  xA  the  placed  sensors  can  make.  To  quantify  how  informative  a  sensor 
placement  A  is,  we  use  the  criterion  of  mutual  information : 

F(A)  =  I(X U;  XV\A)  =  H(XV\A)  -  H(XV\A  \  XA).  (3) 

This  criterion,  first  proposed  by  Caselton  and  Zidek  (1984),  expresses  the  expected  reduction  of 
entropy  of  all  locations  V  \  A  where  we  did  not  place  sensors,  after  taking  into  account  the  mea¬ 
surements  of  our  placed  sensors.  Krause  et  al.  (2007)  show  that  mutual  information  leads  to  intuitive 
placements  with  prediction  accuracy  superior  to  alternative  approaches.  Section  3  explains  how  we 
model  and  learn  a  joint  distribution  over  all  locations  V  and  how  to  efficiently  compute  the  mu¬ 
tual  information.  In  addition  to  mutual  information,  other  objective  functions  F(A)  can  be  used  to 
measure  the  sensing  quality  ( c.f ,  Krause  and  Guestrin,  2007,  and  Section  5.1). 


2.2  What  is  communication  cost? 

Since  each  transmission  drains  battery  of  the  deployed  sensors,  we  have  to  ensure  that  our  sensor 
placements  have  reliable  communication  links,  and  the  number  of  unnecessary  retransmissions  is 
minimized.  If  the  probability  for  a  successful  transmission  between  two  sensor  locations  s  and  t  is 
9S)t,  the  expected  number  of  retransmissions  is  1  j9s,t-  Since  we  have  to  predict  the  success  proba¬ 
bility  between  any  two  locations  s,  t  £  V,  we  will  in  general  only  have  a  distribution  P(9s,t)  with 
density  p(9s,t )  instead  of  a  fixed  value  for  0Sj-  Surprisingly,  this  uncertainty  has  a  fundamental  ef¬ 
fect  on  the  expected  number  of  retransmissions.  For  a  simple  example,  assume  that  with  probability 
5  we  predict  that  our  transmission  success  rate  is  |,  and  with  probability  it  is  {.  Then,  the  mean 
transmission  rate  would  be  |,  leading  us  to  assume  that  the  expected  number  of  retransmissions 
might  be  2.  In  expectation  over  the  success  rate  however,  our  expected  number  of  retransmissions 
becomes  ^-4+^-|  =  2+  |>2.  More  generally,  the  expected  number  is 

cs,t  =  /  7?~P{9s,t)d9s,t.  (4) 

Je  “s,t 

Using  this  formula,  we  can  compute  the  expected  number  of  retransmissions  for  any  pair  of  loca¬ 
tions.  If  V  is  finite,  we  can  model  all  locations  in  V  as  nodes  in  a  graph  Q  =  (V,  E ),  with  the  edges 
E  labeled  by  their  communication  costs.  We  call  this  graph  the  communication  graph  of  V.  For  any 
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sensor  placement  A  C  V,  we  define  its  cost  c(*4)  by  the  minimum  cost  tree  T,  A  C  T  C  V,  con¬ 
necting  all  sensors  A  in  the  communication  graph  for  V.  1  This  cost  model  applies  to  the  common 
setting  where  all  sensors  obtain  sensor  measurements  and  send  them  to  a  base  station.  Finding  this 
minimum  cost  tree  T  to  evaluate  the  cost  function  c(*4)  is  called  the  Steiner  tree  problem;  an  NP- 
complete  problem  that  has  very  good  approximation  algorithms  (Vazirani,  2003).  Our  algorithm, 
pSPIEL,  will  however  not  just  find  an  informative  placement  and  then  simply  add  relay  nodes, 
since  the  resulting  cost  may  be  exorbitant.  Instead,  it  simultaneously  optimizes  sensing  quality  and 
communication  cost. 

Note  that  if  we  threshold  all  link  qualities  at  some  specified  cut-off  point,  and  define  the  edge  costs 
between  two  locations  in  the  communication  graph  as  1  if  the  link  quality  is  above  the  cut-off 
point,  and  infinite  if  the  link  quality  is  below  the  cut-off  point,  then  the  communication  cost  of  a 
sensor  placement  is  exactly  (one  less  than)  the  number  of  placed  sensors.  Hence,  in  this  special 
case,  we  can  interpret  the  maximization  problem  (2)  as  the  problem  of  finding  the  most  informative 
connected  sensor  placement  of  at  most  B  +  1  nodes. 


2.3  Overview  of  our  approach 

Having  established  the  notions  of  sensing  quality  and  communication  cost,  we  now  present  an 
outline  of  our  proposed  approach. 

1 .  We  collect  sensor  and  link  quality  data  from  an  initial  deployment  of  sensors.  From  this  data, 
we  learn  probabilistic  models  for  the  sensor  data  and  the  communication  cost.  Alternatively, 
we  can  use  expert  knowledge  to  design  such  models. 

2.  These  models  allow  us  to  predict  the  sensing  quality  F(A)  and  communication  cost  c(*4)  for 
any  candidate  placement  A  C  V. 

3.  Using  pSPIEL,  our  proposed  algorithm,  we  then  find  highly  informative  placements  which 
(approximately)  minimize  communication  cost.  We  can  approximately  solve  both  the  cover¬ 
ing  and  maximization  problems. 

4.  After  deploying  the  sensors,  we  then  possibly  add  sensors  or  redeploy  the  existing  sensors, 
by  restarting  from  Step  2),  until  we  achieve  a  satisfactory  placement.  (This  step  is  optional.) 

Consider  our  temperature  prediction  example.  Here,  in  step  1),  we  would  place  a  set  of  motes 
throughout  the  building,  based  on  geometrical  or  other  intuitive  criteria.  After  collecting  training 
data  consisting  of  temperature  measurements  and  packet  transmission  logs,  in  step  2),  we  learn 
probabilistic  models  from  the  data.  This  process  is  explained  in  the  following  sections.  Figure  2(c) 
and  Figure  2(d)  present  examples  of  the  mean  and  variance  of  our  model  learned  during  this  step. 
As  expected,  the  variance  is  high  in  areas  where  no  sensors  are  located.  In  step  3),  we  would  then 
explore  the  sensing  quality  tradeoff  for  different  placements  proposed  by  pSPIEL,  and  select  an 
appropriate  one.  This  placement  automatically  suggests  if  relay  nodes  should  be  deployed.  After 

'in  general,  the  locations  A  may  include  distant  sensors,  requiring  us  to  place  relay  nodes,  which  do  not  sense  but 
only  aid  communication.  It  can  occur  that  c({s,  t})  <  c he.,  it  can  be  more  cost  effective  to  route  messages  from  s  to 
t  via  some  other  intermediate  node  than  send  them  directly  from  s  to  t. 
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deployment,  we  can  collect  more  data,  and,  if  the  placement  is  not  satisfactory,  iterate  by  repeating 
from  step  2). 


3  Predicting  sensing  quality 


In  order  to  achieve  highly  informative  sensor  placements,  we  have  to  be  able  to  predict  the  uncer¬ 
tainty  in  sensor  values  at  a  location  s  £  V,  given  the  sensor  values  x_4  at  some  candidate  placement 
A.  This  is  an  extension  of  the  well-known  regression  problem  (c.f,  Guestrin  et  al.,  2004),  where 
we  use  the  measured  sensor  data  to  predict  values  at  locations  where  no  sensors  are  placed.  The 
difference  is  that  in  the  placement  problem,  we  must  be  able  to  predict  not  just  sensor  values  at  unin¬ 
strumented  locations,  but  rather  probability  distributions  over  sensor  values.  Gaussian  Processes 
arc  a  powerful  class  of  models  for  making  such  predictions.  To  introduce  this  concept,  first  consider 
the  special  case  of  the  multivariate  normal  distribution  over  a  set  X\>  of  random  variables  associated 
with  n  locations  V: 


P{XV  =  xv) 


1  r-;(xy-M)TS  1(xv-fi) 

(27r)n/2|£| 


This  model  has  been  successfully  used  for  example  to  model  temperature  distributions  (Deshpande 
et  al.,  2004),  where  every  location  in  V  corresponds  to  one  particular  sensor  placed  in  the  building. 
The  multivariate  normal  distribution  is  fully  specified  by  providing  a  mean  vector  p  and  a  covariance 
matrix  E.  If  we  know  the  values  of  some  of  the  sensors  4  C  V,  we  find  that  for  s  €  V  \  A  the 
conditional  distribution  P(XS  =  xs  \  Xa  =  x^)  is  a  normal  distribution,  where  mean  ps\A  and 
variance  cr2^  are  given  by 

ds\A  =  ds  T  —  Pa)i  (5) 

°"*s\A  =  as~  ^sA^AA^As-  (6) 

Hereby,  Es_4  =  E^s  is  a  row  vector  of  the  covariances  of  Xs  with  all  variables  in  Xa-  Similarly, 
E_4_4  is  the  submatrix  of  E,  only  containing  the  entries  relevant  to  Xa,  and  cr2  is  the  variance  of 
Xs.  p>A  and  ps  arc  the  means  of  Xa  and  Xs  respectively.  Hence  the  covariance  matrix  E  and 
the  mean  vector  p  contain  all  the  information  needed  to  compute  the  conditional  distributions  of 
Xs  given  Xp.  The  goal  of  an  optimal  placement  will  intuitively  be  to  select  the  observations  such 
that  the  posterior  variance  (6)  for  all  variables  becomes  uniformly  small.  If  we  can  make  a  set  of 
T  measurements  xl,  , . . .  ,xh  of  all  sensors  V,  we  can  estimate  E  and  p,  and  use  it  to  compute 
predictive  distributions  for  any  subsets  of  variables.  However,  in  the  sensor  placement  problem,  we 
must  reason  about  the  predictive  quality  of  locations  where  we  do  not  yet  have  sensors,  and  thus 
need  to  compute  predictive  distributions,  conditional  on  variables  for  which  we  do  not  have  sample 
data. 


Gaussian  Processes  arc  a  solution  for  this  dilemma.  Technically,  a  Gaussian  Process  (GP)  is  a  joint 
distribution  over  a  (possibly  infinite)  set  of  random  variables,  such  that  the  marginal  distribution  over 
any  finite  subset  of  variables  is  multivariate  Gaussian.  In  our  temperature  measurement  example,  we 
would  associate  a  random  variable  X(s)  with  each  point  s  in  the  building,  which  can  be  modeled  as 
a  subset  V  C  M2.  The  GP  X(-),  which  we  will  refer  to  as  the  sensor  data  process,  is  fully  specified 
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by  a  mean  function  Ai(-)  and  a  symmetric  positive  definite  Kernel  function  1C(  generalizing 
the  mean  vector  and  covariance  matrix  in  the  multivariate  normal  distribution:  For  any  random 
variable  X(s)  G  X,  M.{s)  will  correspond  to  the  mean  of  A(,s),  and  for  any  two  random  variables 
A’(s),  X(t)  G  X,  K(s,  t)  will  be  the  covaiiance  of  Af(s)  and  X(t).  This  implies,  that  for  any  finite 
subset  A  =  {si,  s 2,  •  •  • ,  sm},  A  C  V  of  locations  variables,  the  covaiiance  matrix  of  the 
variables  XA  is  obtained  by 


SAA 


/  K(si,  si)  JC(si,s2) 
IC(s2,s  1)  K(s2,s2) 


AI(si ,  s7n)  \ 
/C(S2,  Sm)  I 


V  n  Si)  /C(  »2) 


/C(  %)  / 


and  its  mean  is  =  (Ai(si),  M.{s2), . . .  ,A4(sm)).  Using  formulas  (5)  and  (6),  the  problem 
of  computing  predictive  distributions  is  reduced  to  finding  the  mean  and  covaiiance  functions  M. 
and  K.  for  the  phenomena  of  interest.  In  general,  this  is  a  difficult  problem  -  we  want  to  estimate 
these  infinite  objects  from  a  finite  amount  of  sample  data.  Consequently,  often  strongly  limiting 
assumptions  arc  made:  It  is  assumed  that  the  covaiiance  of  any  two  random  variables  is  independent 
of  their  location  (stationaiity),  or  even  only  a  function  of  their  distance  (isotropy).  A  kernel  function 
often  used  is  the  Gaussian  kernel 


JC(s,t)  =  exp  ^  •  (7) 

These  isotropy  or  stationaiity  assumptions  lead  to  similar  problems  as  encountered  in  the  approach 
using  geometric  sensing  regions,  as  spatial  inhomogeneities  such  as  walls,  windows,  reflections  etc. 
arc  not  taken  into  account.  These  inhomogeneities  arc  however  dominantly  encountered  in  real  data 
sets,  as  indicated  in  Figure  2(a). 

In  this  paper,  we  do  not  make  these  limiting  assumptions.  We  use  an  approach  to  estimate  non- 
stationarity  proposed  by  Nott  and  Dunsmuir  (2002).  Their  method  estimates  several  stationary 
GPs  with  kernel  functions  as  in  (7),  each  providing  a  local  description  of  the  nonstationary  pro¬ 
cess  around  a  set  of  reference  points.  These  reference  points  arc  chosen  on  a  grid  or  near  the  likely 
sources  of  nonstationary  behavior.  The  stationary  GPs  arc  combined  into  a  nonstationary  GP,  whose 
covariance  function  interpolates  the  empirical  covaiiance  matrix  estimated  from  the  initial  sensor 
deployment,  and  near  the  reference  points  behaves  similarly  to  the  corresponding  stationary  process. 
Figure  2(b)  shows  a  learned  nonstationary  GP  for  our  temperature  data.  We  refer  the  reader  to  Nott 
and  Dunsmuir  (2002)  for  more  details.  Note  that  as  non-parametric  models,  given  enough  sensor 
data,  GPs  can  model  very  complex  processes,  including  phenomena  decaying  as  inverse  polynomial 
laws,  etc.  Rasmussen  and  Williams  (2006). 

Once  we  have  obtained  estimates  for  the  mean  and  covaiiance  functions,  we  can  use  these  functions 
to  evaluate  the  mutual  information  criterion.  In  order  to  evaluate  Equation  (3),  we  need  to  compute 
conditional  entropies  H(XS  \  Xjf),  which  involve  integrals  over  all  possible  assignments  to  the 
placed  sensors  x^.  Fortunately,  there  is  a  closed  form  solution:  We  find  that 

H(XV\A  I  XA)  =  ^log((27rer|Sv\^|), 


hence  it  only  depends  on  the  determinant  of  the  predictive  covariance  matrix  Hereby, 

£y\_4|_4  can  be  inferred  using  Equation  (6).  Details  on  efficient  computation  arc  described,  e.g.,  by 
Krause  et  al.  (2007). 

In  addition  to  the  mutual  information  criterion,  our  approach  applies  to  a  variety  of  other  criteria, 
discussed  in  more  detail  in  Section  5.1.  Also  note  that  in  order  to  address  phenomena  that  change 
over  time,  one  can  replace  the  spatial  model  -P(Ay)  with  a  spatio-temporal  model.  In  this  case,  we 
associate  with  every  location  s  £  V  the  set  of  all  measurements  that  will  be  made  at  this  location 
over  time,  similarly  as  in  the  approach  by  Meliou  et  al.  (2007). 


4  Predicting  communication  cost 


As  discussed  in  Section  2.2,  an  appropriate  measure  for  communication  cost  is  the  expected  number 
of  retransmissions.  If  we  have  a  probability  distribution  P(9s,t)  over  transmission  success  proba¬ 
bilities  9sj,  Equation  (4)  can  be  used  in  a  Bayesian  approach  to  compute  the  expected  number  of 
retransmissions.  The  problem  of  determining  such  predictive  distributions  for  transmission  success 
probabilities  is  very  similar  to  the  problem  of  estimating  predictive  distributions  for  the  sensor  val¬ 
ues  as  discussed  in  Section  3,  suggesting  the  use  of  GPs  for  predicting  link  qualities.  A  closer  look 
however  shows  several  qualitative  differences:  When  learning  a  model  for  sensor  values,  samples 
from  the  actual  values  can  be  obtained.  In  the  link  quality  case  however,  we  can  only  determine 
whether  certain  messages  between  nodes  were  successfully  transmitted  or  not.  Additionally,  trans¬ 
mission  success  probabilities  are  constrained  to  be  between  0  and  1 .  Fortunately,  GPs  can  be  ex¬ 
tended  to  handle  this  case  as  well  (Csato  et  al.,  2000).  In  this  classification  setting,  the  predictions 
of  the  GP  are  transformed  by  the  sigmoid,  also  called  link  function,  f(x)  =  1+CXp^_a,^  ■  For  large 
positive  values  of  x,  f(x)  is  close  to  1,  for  large  negative  values  it  is  close  to  0  and  /( 0)  = 

Since  we  want  to  predict  link  qualities  for  every  pair  of  locations  in  V,  we  define  a  random  process 
@(s,t)  =  f(W(s,t )),  where  W(s,t)  is  a  GP  over  ( s,t )  £  V2.  We  call  Q(s,t )  the  link  quality 
process.  This  process  can  be  learned  the  following  way.  In  our  initial  deployment,  we  let  each  sen¬ 
sor  broadcast  a  message  once  every  epoch,  containing  its  identification  number.  Each  sensor  also 
records,  from  which  other  sensors  it  has  received  messages  this  epoch.  This  leads  to  a  collection  of 
samples  of  the  form  (sjj..  .syj,:,  where  i,j  range  over  the  deployed  sensors,  k  ranges 

over  the  epochs  of  data  collection,  and  Hfisj,  Sj)  is  1  if  node  i  received  the  message  from  node  j 
in  epoch  k,  and  0  otherwise.  We  will  interpret  sj)  as  samples  from  the  link  quality  process 
0(-,  •).  Using  these  samples,  we  want  to  compute  predictive  distributions  similar  to  those  described 
in  Equations  (5)  and  (6).  Unfortunately,  in  the  classification  setting,  the  predictive  distributions 
cannot  be  computed  in  closed  form  anymore,  but  one  can  resort  to  approximate  techniques  (c.f, 
Csato  et  al.,  2000).  Using  these  techniques,  we  infer  the  link  qualities  by  modeling  the  underlying 
GP  IU(s,  t).  Intuitively,  the  binary  observations  will  be  converted  to  “hallucinated”  observations  of 
W (s.  t),  such  that  @(s,  t)  =  f(W (s.  t))  will  correspond  to  the  empirical  transmission  probabilities 
between  locations  s  and  t.  We  now  can  use  Equations  (5)  and  (6)  to  compute  the  predictive  distribu¬ 
tions  W (.s,  t )  for  any  pair  of  locations  (.s,  t)  £  V2.  Applying  the  sigmoid  transform  will  then  result 
in  a  probability  distribution  over  transmission  success  probabilities.  In  our  implementation,  instead 
of  parameterizing  lU(.s,  t)  by  pairs  of  coordinates,  we  use  the  parametrization  W(t  —  s,s).  The 
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Input:  Locations  C  C  V 

Output:  Greedy  sequence  g1,g2,...,  g\C\,  Ci  =  {gi,.. 

-,9i} 

begin 

Ci 

o 

T 

tSt 

for  j  =  1  to  \C\  do 

9j  argrnax  F(Cj- 1  U  {p})  -  F(Cj- 1); 

geC\Cj- 1 

Cj  <  k/—  i  U  (j  j , 

end 

end 

Algorithm  1:  Greedy  algorithm  for  maximizing  mutual  information. 


first  component  of  this  parametrization  is  the  displacement  the  successful  or  unsuccessful  message 
has  traveled,  and  the  second  component  is  the  actual  set  of  physical  coordinates  of  the  transmitting 
sensor.  This  parametrization  tends  to  exhibit  better  generalization  behavior,  since  the  distance  to 
the  receiver  (component  1)  is  the  dominating  feature,  when  compared  to  the  spatial  variation  in  link 
quality.  Figure  1(c)  shows  an  example  of  the  predicted  link  qualities  using  a  GP  for  our  indoors 
deployment.  Figure  1(d)  shows  the  variance  in  this  estimate. 

What  is  left  to  do  is  to  compute  the  expected  number  of  retransmissions,  as  described  in  formula 
(4).  Assuming  the  predictive  distribution  for  W(s,  t)  is  normal  with  mean  //  and  variance  a2,  we 
compute  f  p.  a2)dx  =  1  +  exp (— //  +  a'2),  where  A [(■;  /i.  cr2)  is  the  normal  density  with 

mean  g,  and  variance  a2.  Hence  we  have  a  closed  form  solution  for  this  integral.  If  cr2  =  0,  we 
simply  retain  that  the  expected  number  of  retransmissions  is  the  inverse  of  the  transmission  success 
probability.  If  cr 2  is  very  large  however,  the  expected  number  of  retransmission  drastically  increases. 
This  implies  that  even  if  we  predict  the  transmission  success  probability  to  be  reasonably  high,  e.g., 
2/3,  if  we  do  not  have  enough  samples  to  back  up  this  prediction  and  hence  our  predictive  variance 
a2  is  very  large,  we  necessarily  have  to  expect  the  worst  for  the  number  of  retransmissions.  So, 
using  this  GP  model,  we  may  determine  that  it  is  better  to  select  a  link  with  success  probability  1/3, 
about  which  we  arc  very  certain,  to  a  link  with  a  higher  success  probability,  but  about  which  we  arc 
very  uncertain.  Enabling  this  tradeoff  is  a  great  strength  of  using  GPs  for  predicting  communica¬ 
tion  costs.  Note  that  instead  of  using  GPs,  any  other  method  for  quantifying  communication  cost 
between  arbitrary  pairs  of  locations  can  be  used  in  our  approach  as  well. 


5  Problem  structure  in  sensor  placement  optimization 


We  now  address  the  covering  and  maximization  problems  described  in  Section  2.  We  will  consider 
a  discretization  of  the  space  into  finitely  many  points  V,  e.g.,  points  lying  on  a  grid.  For  each  pair  of 
locations  in  V,  we  define  the  edge  cost  as  the  expected  number  of  retransmissions  required  to  send 
a  message  between  these  nodes  (since  link  qualities  are  asymmetric,  we  use  the  worse  direction 
as  the  cost).  The  set  of  edges  that  have  finite  cost  is  denoted  by  E.  The  challenge  in  solving  the 
optimization  problems  (1)  and  (2)  is  that  the  search  space — the  possible  subsets  A  C  V — is  expo¬ 
nential;  more  concretely,  the  problem  is  easily  seen  to  be  NP-hard  as  a  corollary  to  the  hardness  of 
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the  unconstrained  optimization  problem  (Krause  et  al.,  2007;  Kar  and  Banerjee,  2003).  Given  this, 
we  seek  an  efficient  approximation  algorithm  with  strong  performance  guarantees.  In  Section  6, 
we  present  such  an  algorithm.  The  key  to  finding  good  approximate  solutions  is  understanding  and 
exploiting  problem  structure. 

Intuitively,  we  expect  that  in  many  cases,  the  sensor  placement  problem  satisfies  the  following 
diminishing  returns  property:  The  more  sensors  already  placed,  the  less  the  addition  of  a  new  sensor 
helps  us.  This  intuition  is  formalized  by  the  concept  of  submodularity:  A  set  function  F  defined  on 
subsets  of  V  is  called  submodular  ( c.f ,  Nemhauser  et  ah,  1978),  if 

F(Au{s})-F(A)>F(Bu{s})-F(B),  (8) 

for  all  A  C  B  C  V  and  s  G  V  \B.  The  function  F  is  called  monotonic  if  F(A)  <  F(B)  for  all 
A  C  B  C  V.  Note  that  the  rate  at  which  diminishing  returns  occurs  can  vary  across  the  sensing 
domain.  For  example,  in  our  temperature  prediction  example,  there  could  be  two  types  of  rooms 
in  the  builing:  large  rooms  where  the  temperature  varies  smoothly,  and  thus  a  small  number  of 
measurements  would  allow  accurate  predictions;  and  small  rooms,  where  temperature  fluctuates 
more  rapidly  due  to  external  influences  such  as  outside  temperature,  different  appliances  etc.  If 
the  temperature  were,  for  example,  represented  by  a  probabilistic  model,  one  may  imagine  that  the 
correlations  arc  more  far  reaching  in  the  large  rooms,  and  narrower  in  the  smaller  rooms.  Thus,  in 
the  large  rooms,  the  first  measurement  provides  high  sensing  quality,  and  the  incremental  benefits 
decrease  quickly  afterwards.  In  the  smaller  rooms,  the  first  measurement  provides  low  utility  (as 
it  helps  predicting  only  for  a  small  area),  but  the  next  measurements  provide  significant  additional 
information,  thus  diminishing  returns  occurs  later.  There  arc  certainly  sensing  problems  that  arc 
not  submodular  (e.g.,  Krause  and  Guestrin  (2009)),  where  a  strong  increase  in  sensing  quality  can 
be  achieved  only  by  placing  multiple  sensors.  As  we  show  in  Section  5.1,  however,  many  practical 
sensing  quality  functions  are  provably  submodular. 

In  addition  to  submodularity,  the  sensing  quality  exhibits  another  important  locality  property:  Sen¬ 
sors  which  are  very  far  apart  are  approximately  independent.  This  implies  that  if  we  consider  plac¬ 
ing  a  subset  of  sensors  A\  in  one  area  of  the  building,  and  ^2  in  another  area,  then  F(A\  U  A2)  ~ 
F(Al)  +  F(A2).  More  formally,  we  say  two  sets  A\  and  A->  have  distance 

d(Ai,A2)  =  min  c({s,f}). 


We  will  abstract  out  the  locality  property  to  assume  that  there  are  constants  r  >  0  and  0  <  7  <  1, 
such  that  for  any  subsets  of  nodes  A\  and  ^2  such  that  d(A\,A2)  >  r  it  holds  that  F(A\  U 
A2)  >  F(A\)  +  rfF(A2).  Such  a  submodular  function  F  will  be  called  ( r,ry)-local .  Note  that 
even  phenomena  that  appear  to  be  non-local  can  often  be  modeled  using  local  objective  functions. 
Consider  our  temperature  prediction  example.  External  influences  such  as  outside  temperature  can 
induce  correlation  between  all  sensing  locations.  However,  often,  such  external  influences  can 
be  modeled  by  subtracting  an  appropriately  chosen  mean  function  (which,  e.g.,  models  the  mean 
temperature  at  different  locations  during  different  times  of  the  day)  from  the  GP  (Rasmussen  and 
Williams,  2006).  In  this  case,  the  correlations  between  the  deviations  from  the  mean  function  (and 
therefore  the  sensing  quality  function)  arc  typically  local. 
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5.1  Examples  of  (r,  7)-local  submodular  functions 

There  arc  several  important  examples  of  monotonic,  submodular  and  (r,  7) -local  objective  func¬ 
tions: 


Geometric  coverage  Suppose  that  with  each  location  s  6  V,  we  associate  a  sensing  region  Bs  C 
V.  Then  the  function 


F(A) 


measures  the  size  of  the  region  covered  by  placing  sensors  at  locations  A.  In  this  case,  F  is 
monotonic,  submodular  and  (r,  l)-local,  where  r  =  2  max.s  max/  jGRK  d(i,  j)  is  twice  the  maxi¬ 
mum  diameter  of  the  sensing  regions.  This  objective  function  F  captures  the  commonly  used  disk 
model. 


Probabilistic  detections  Suppose,  we  want  to  place  sensors  for  event  detection  (e.g.,  detecting 
fires  in  buildings),  and  that  a  sensor  placed  at  a  location  s  can  detect  events  at  distance  d  with 
probability  0  <  tps(d)  <  1,  where  <ps  is  a  monotonically  decreasing  function.  Further  suppose 
that  we  place  sensors  at  locations  A.  Then,  if  each  sensor  detects  independently  of  the  others,  the 
probability  of  detecting  an  event  happening  at  location  t  G  V  is 

Ft(A)  =  1  -  JJ(1  -  <ps(d(s,t))). 

seA 

Now  let  r  =  2<p~1(l/2),  i.e.,  the  distance  at  which  detection  happens  with  probability  1/2.  Then 
Ft  is  a  monotonic,  submodular  and  (r,  l/2)-local.  Now  suppose  we  have  a  distribution  Q(t)  over 
the  possible  outbreak  locations.  Then  the  expected  detection  performance 

F(A)  =  ^Q(t)Ft(A) 

t. 

is,  as  a  convex  combination  of  monotonic,  submodular  and  (r,  l/2)-local  objectives  also  monotonic, 
submodular  and  (r,  l/2)-local. 


Mutual  information  Suppose  P(Ty)  is  a  GP  with  compact  kernel,  i.e.,  there  is  a  constant  r  such 
that  )C(s,t)  =  0  whenever  d(s,t )  >  r/2.  Then  the  mutual  information  F(A)  =  H{Xyx^)  — 
H(XV\A  I  XA)  is  (r,  1) -local.  Even  if  the  kernel  is  not  compact,  mutual  information  empirically 
exhibits  (r,  7) -locality  ( c.f ,  Figure  6(d)). 


Expected  mean  squared  prediction  error  Another  possible  choice  for  the  sensing  quality  func¬ 
tion  is  the  expected  reduction  in  mean  squared  prediction  error  (MSE): 

F(A)  =  ^2  [  P(x^)[Var(A’s)  -  Var(As  |  xA)]dxA, 
sev ' 
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gm[2£] 


Figure  3:  Example  demonstrating  the  poor  performance  of  the  greedy  algorithm. 


where 

Var(^s  |  xA)  =  E[(*s  -  E{XS  |  x^))2  |  x^] 

is  the  predictive  variance  at  location  Xs  after  observing  XA  =  X7.  F(A)  is  always  monotonic. 
Under  the  same  conditions  as  for  the  mutual  information  criterion  F [A)  is  (r,  7)  —local.  Under 
some  additional  conditions2  on  the  covariance  function  K.  ,  the  criterion  is  also  submodular. 

5.2  The  greedy  algorithm 

With  any  such  monotonic  submodular  set  function  F,  we  can  associate  the  following  greedy  algo¬ 
rithm:  Suppose  we  would  like  to  find  the  set  of  k  locations  maximizing  the  sensing  quality  F{A),  ir¬ 
respective  of  the  cost  c(*4).  The  greedy  algorithm  starts  with  the  empty  set,  and  at  each  iteration  add 
to  the  current  set  A  the  element  s  which  maximizes  the  greedy  improvement  F(A  U  { s } )  —F(A'), 
and  continue  until  A  has  the  specified  size  of  k  elements.  Perhaps  surprisingly,  if  Ac  is  the  set  se¬ 
lected  by  the  greedy  algorithm  (with  \Ac\  =  k)  and  if  F  is  monotonic  submodular  with  F(0)  =  0, 
then 

F{Ag)  >  (1  —  1/e)  max  F(A), 

A:\A\=k 

i.e.,  Ac  is  at  most  a  constant  factor  (1  —  1/e)  worse  than  the  optimal  solution  (Nemhauser  et  al., 
1978).  Krause  et  al.  (2007)  prove  that  the  mutual  information  criterion  is  submodular  and  approxi¬ 
mately  monotonic:  For  any  e  >  0,  if  we  choose  the  discretization  fine  enough  (polynomially-large  in 
1  / £■),  then  the  solution  obtained  by  the  greedy  algorithm  is  at  most  ( 1  —  1  /e)  OPT  —  e.  Algorithm  1 
presents  the  greedy  algorithm  for  mutual  information;  for  details  we  refer  the  reader  to  Krause  et  al. 
(2007). 

Unfortunately,  the  near-optimality  of  the  greedy  algorithm  only  holds  when  we  do  not  take  com¬ 
munication  cost  into  account,  and  does  not  generalize  to  the  covering  and  maximization  problems 
(1)  and  (2)  which  we  study  in  this  paper.  For  an  illustration,  consider  Figure  3.  In  this  illustra¬ 
tion,  we  consider  an  additive  (a  special  case  of  a  submodular)  sensing  quality  function  F  defined 
on  the  ground  set  V  =  (s,  t,  01, . . . ,  on,  <71, . . . ,  gm},  where  the  sensing  quality  of  a  selected  set 
A  of  nodes  is  the  sum  of  the  values  in  squared  brackets  associated  with  the  selected  nodes  (e.g., 

2Under  conditional  suppressor-freeness,  c.f,  Das  and  Kempe  (2008)  for  details. 
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F({s,  o i  .  /■/]  } )  =  3e).  Consider  the  setting  where  we  want  to  solve  the  maximization  problem  with 
root  s  and  budget  B.  The  optimal  solution  would  be  to  choose  the  set  A*  =  {s,  o\1 . . . ,  ob/(2£)}, 
with  value  F(A*)  =  (B/s  —  l)(B/2)  +  e.  The  simple  greedy  algorithm  that  ignores  cost  would 
first  pick  t,  and  hence  immediately  run  out  of  budget,  returning  set  Acj  =  {s,  t}  with  total  sensing 
quality  B.  A  greedy  algorithm  that  takes  the  cost  into  account  greedily  selecting  the  element 


*  F(A  U  {s}  —  F(A) 

s  =  argrnax  — r^- - — — 

sev\A  c{Au{s}) -c(A) 


and  hence  optimizing  the  benefit/cost  ratio  of  the  chosen  element  s*  would  select  the  set  Agcb  = 
{s,  gi, . . . ,  g Bjr }  with  total  value  IB.  Hence,  as  e  — >  0,  the  greedy  algorithm  performs  arbitrarily 
worse  than  the  optimal  solution.  In  Section  7  we  show  that  this  poor  performance  of  the  greedy 
algorithm  actually  occurs  in  practice. 


6  Approximation  algorithm 


In  this  section,  we  propose  an  efficient  approximation  algorithm  for  selecting  Padded  Sensor  Place¬ 
ments  at  Informative  and  cost-Effective  Locations  (pSPIEL).  Our  algorithm  assumes  that  the  sens¬ 
ing  quality  function  is  (r,  7) -local  submodular  (as  discussed  in  Section  5).  As  input,  it  is  given  the 
discretization  V  of  the  sensing  domain,  the  sensing  quality  function  F(A)  (for  example,  based  on 
the  mutual  information  criterion  applied  to  a  GP)  and  a  communication  graph  Q  =  (V,  E),  where 
the  edges  indicate  the  communication  cost  (e.g.,  based  on  the  expected  number  of  retransmissions) 
between  any  two  possible  sensing  locations.  Before  presenting  our  results  and  performance  guaran¬ 
tees,  here  is  an  overview  of  our  algorithm. 

1 .  We  randomly  select  a  decomposition  of  the  possible  locations  V  into  small  clusters  using  Al¬ 
gorithm  2  ( c.fi ,  Figure  4(a),  Section  6.1,  Gupta  et  al.  (2003)).  Nodes  close  to  the  “boundary” 
of  their  clusters  arc  stripped  away  and  hence  the  remaining  clusters  arc  “well-separated”.  (We 
prove  that  not  too  many  nodes  are  stripped  away).  The  well-separatedness  and  the  locality 
property  of  F  ensure  the  clusters  arc  approximately  independent,  and  hence  very  informa¬ 
tive.  Since  the  clusters  arc  small,  we  arc  not  concerned  about  communication  cost  within  the 
clusters. 

2.  We  use  the  greedy  algorithm  (Algorithm  1)  within  each  cluster  i  to  get  an  order  j ,  <772 ,  •  •  • 

on  the  rii  nodes  in  cluster  i.  We  call  7  =  g^i  the  center  of  cluster  i.  Create  a  chain  for  this 
cluster  by  connecting  the  vertices  in  this  order,  with  suitably  chosen  costs  for  each  edge 
(gi  j,  as  in  Figure  4(b).  The  submodularity  of  F  ensures  that  the  first  k  nodes  in  this 

chain  are  almost  as  informative  as  the  best  subset  of  k  nodes  in  the  cluster  (Krause  et  ah, 
2007). 

3.  Create  a  “modular  approximation  graph”  Q'  from  Q  by  taking  all  these  chains,  and  creating 
a  fully  connected  graph  on  the  cluster  centers  z±,  Z2,  ■  ■  ■ ,  zm,  the  first  nodes  of  each  chain. 
The  edge  costs  (zi,Zi>)  correspond  to  the  shortest  path  distances  between  zt  and  zy,  as  in 
Figure  4(c). 
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(a)  Padded  decomposition 


(b)  Chain  in  modular  approximation  graph  for  cluster  1 


(c)  Modular  approximation  graph  (d)  Quota-MST  solution  on  MAG 


(e)  Final  solution 


Figure  4:  Illustration  of  our  algorithm:  (a)  presents  a  padded  decomposition  into  four  clusters;  (b) 
displays  the  chain  in  the  modular  approximation  graph  associated  with  cluster  1;  (c)  shows  the 
modular  approximation  graph  with  chains  induced  by  greedy  algorithm  and  the  complete  “core”; 
(d)  the  solution  of  the  Quota-MST  problem  on  the  modular  approximation  graph;  and  (e)  is  the  final 
solution  after  expanding  the  Quota-MST  edges  representing  shortest  paths. 


4.  We  now  need  to  decide  how  to  distribute  the  desired  quota  to  the  clusters.  Hence,  we  approxi¬ 
mately  solve  the  Quota-MST  problem  (for  the  covering  version)  or  the  Budget-MST  problem 
(for  the  maximization  problem)  on  Q'  (Garg,  2005;  Johnson  et  ah,  2000)  (Figure  4(d)). 

5.  Expand  the  chosen  edges  of  Q'  in  terms  of  the  shortest  paths  they  represent  in  Q,  as  in  Fig¬ 
ure  4(e). 

Suppose  n  =  |V|  is  the  number  of  nodes  in  V,  and  A*  denotes  the  optimal  set  (for  the  covering  or 
maximization  problem),  with  cost  t .  Finally,  let  dim(V,  E )  be  the  doubling  dimension  of  the  data, 
which  is  constant  for  many  graphs  (and  for  costs  that  can  be  embedded  in  low-dimensional  spaces), 
and  is  O(logn)  for  arbitrary  graphs  (c.f,  Gupta  et  ah,  2003).  We  prove  the  following  guarantee: 

Theorem  1.  Let  Q  =  (V,  E )  be  a  graph  and  a  F  be  a  (r,  7 )-local  monotone  submodular function 
on  V.  Suppose  Q  contains  a  tree  T*  with  cost  £*,  spanning  a  set  A*.  Then  pSPIEL  can  find  a  tree  T 
with  cost  0(r  dim(V,.E) )  x  £*,  spanning  a  set  A  with  expected  sensing  quality  F(A )  >  (7)  x  F(A*). 
The  algorithm  is  randomized  and  runs  in  polynomial-time.  □ 

In  other  words,  Theorem  1  shows  that  we  can  solve  the  covering  and  maximization  problems  (1) 
and  (2)  to  provide  a  sensor  placement  for  which  the  communication  cost  is  at  most  a  small  factor 
(at  worst  logarithmic)  larger,  and  for  which  the  sensing  quality  is  at  most  a  constant  factor  worse 


15 


Input:  Graph  (V,  E),  shortest  path  distance  d( •),  r  >  0,  a  >  64dim(V,  E) 
Output:  ( a ,  r)-padded  decomposition  C  =  { Cu  :  u  £  U } 

begin 

repeat 

C  <— {a  random  element  in  V}; 
while  3  v£V  :  Mu£U  d(u,v )  >  r'  AoU*—  Ul3{v}\ 

7 r  <—  random  permutation  on  U: 

R  <—  uniform  at  random  in  (r',  2 r'] ; 
foreach  u  £  U  according  to  7r  do 

Cu  <—  {v  £  V  :  d(u,  v )  <  R,  and  Vtt'  £ 

U  appealing  earlier  that  u  in  tt  .  div' .  v )  >  R } ; 
end 

until  at  least  \  nodes  r -padded  ; 

end 

Algorithm  2:  Algorithm  for  computing  padded  decompositions. 


than  the  optimal  solution.  The  proof  can  be  found  in  the  Appendix.  While  the  actual  guarantee 
of  our  algorithm  holds  in  expectation,  running  the  algorithm  a  small  (polynomial)  number  of  times 
will  lead  to  appropriate  solutions  with  arbitrarily  high  probability.  Details  on  this  procedure  can  be 
found  in  Section  8.  In  the  rest  of  this  section,  we  flesh  out  the  details  of  the  algorithm,  giving  more 
technical  insight  and  intuition  about  the  performance  of  our  approach. 


6.1  Padded  decompositions 

To  exploit  the  locality  property,  we  would  like  to  decompose  our  space  into  “well-separated”  clus¬ 
ters;  loosely,  an  r-padded  decomposition  is  a  way  to  do  this  so  that  most  vertices  of  V  lie  in  clusters 
C,  that  are  at  least  r  apart.  Intuitively,  padded  decompositions  allow  us  to  split  the  original  placement 
problem  into  approximately  independent  placement  problems,  one  for  each  cluster  C%.  This  padding 
and  the  locality  property  of  the  objective  function  F  guarantee  that,  if  we  compute  selections  A\,. . . , 
Am  for  each  of  the  m  clusters  separately,  then  it  holds  that  F{A\  U  •  •  •  U  Am)  >7Ei^(A),i.e., 
we  only  lose  a  constant  factor.  An  example  is  presented  in  Figure  4(a). 

If  we  put  all  nodes  into  a  single  cluster,  we  obtain  a  padded  decomposition  that  is  not  very  useful.  To 
exploit  our  locality  property,  we  want  clusters  of  size  about  r  that  arc  at  least  r  apart.  It  is  difficult 
to  obtain  separated  clusters  of  size  exactly  r,  but  padded  decompositions  exist  for  arbitrary  graphs 
for  cluster  sizes  a  constant  a  larger,  where  a  is  D(dim(V,  E))  (Gupta  et  ah,  2003).  We  want  small 
clusters,  since  we  can  then  ignore  communication  cost  within  each  cluster. 

Formally,  an  (a,  r) -padded  decomposition  is  a  probability  distribution  over  partitions  of  V  into 
clusters  Ci, . . . ,  Cm,  such  that: 

(i)  Every  cluster  Ci  in  the  partition  is  guaranteed  to  have  bounded  diameter,  i.e.,  diam(C,)  <  or. 

(ii)  Each  node  s  £  V  is  r-padded  in  the  partition  with  probability  at  least  />.  (A  node  s  is  r-padded 
if  all  nodes  t  at  distance  at  most  r  from  s  arc  contained  in  the  same  cluster  as  s.) 
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The  parameter  p  can  be  chosen  as  a  constant  (in  our  implementation,  p  =  J,).  In  this  paper,  we  use 
the  term  padded  decomposition  to  refer  both  to  the  distribution,  as  well  as  samples  from  the  distri¬ 
bution,  which  can  be  obtained  efficiently  using  Algorithm  2  (Gupta  et  ah,  2003).  In  pSPIEL,  for  a 
fixed  value  of  the  locality  parameter  r,  we  gradually  increase  a,  stopping  when  we  achieve  a  parti¬ 
tion,  in  which  at  least  half  the  nodes  are  r-padded.  This  rejection  sampling  is  the  only  randomized 
part  of  our  algorithm,  and,  in  expectation,  the  number  of  required  samples  is  polynomial. 

Our  algorithm  strips  away  nodes  that  arc  not  r-padded,  suggesting  a  risk  of  missing  informative 
locations.  The  following  Lemma  proves  that  we  will  not  lose  significant  information  in  expecta¬ 
tion. 

Lemma  2.  Consider  a  submodular function  Ff)  on  a  ground  set  V,  a  set  B  C  V,  and  a  probability 
distribution  over  subsets  A  ofB  with  the  property  that,  for  some  constant  p,  we  have  Pr  \v  G  A]  >  p 
for  all  v  G  B.  Then  E[F(.4)]  >  pF(B).  □ 

The  proof  of  this  Lemma  appeal's  in  the  Appendix.  Let  A*  be  the  optimal  solution  for  the  covering 
or  maximization  problem,  and  let  A*  denote  a  subset  of  nodes  in  A*  that  are  r-padded.  Lemma  2 
proves  that,  in  expectation,  the  information  provided  by  A*  is  at  most  a  constant  factor  p  worse  than 
A* .  Since  the  cost  of  collecting  data  from  A*  is  no  larger  than  that  of  A* ,  this  lemma  shows  that 
our  padded  decomposition  preserves  near-optimal  solutions. 

6.2  The  greedy  algorithm 

After  having  sampled  a  padded  decomposition,  we  run  the  greedy  algorithm  as  presented  in  Algo¬ 
rithm  1  on  the  r-padded  nodes  in  each  cluster  C„  with  k  set  to  n,.  the  number  of  padded  elements 
in  cluster  C,.  Let  us  label  the  nodes  as  gpi,  glp,  . . gin.  in  the  order  they  are  chosen  by  the  greedy 
algorithm,  and  let  CtJ  =  {<y,j  , . . . ,  gtjJ}  denote  the  greedy  set  after  iteration  j.  From  Krause  et  al. 
(2007)  we  know  that  each  set  Clj  is  at  most  a  factor  (1  —  1/e)  worse  than  the  optimal  set  of  j 
padded  elements  in  that  cluster.  Furthermore,  from  (r,  7) -locality  and  using  the  fact  that  the  nodes 
are  r-padded,  we  can  prove  that 

nCl,n  U  •  •  •  u  ^nj2k=l^^k,jk)  —  t(1— i) 

for  any  collection  of  indices  j\, jrn,  where  Cf  -fc  denotes  the  optimal  selection  of  jk  nodes  within 
cluster  k. 

6.3  The  modular  approximation  graph  Q' 

In  step  3),  pSPIEL  creates  the  auxiliary  modular  approximation  graph  (MAG)  Q’  from  Q.  In¬ 
tuitively,  this  MAG  will  approximate  Q,  such  that  running  the  Quota-MST  algorithm  on  it  will 
decide  how  many  nodes  should  be  picked  from  each  cluster.  The  nodes  of  Q'  are  the  greedy 
sets  Cij.  The  greedy  sets  for  cluster  i  are  arranged  in  a  chain  with  edge  connecting  Ci  ? 
and  Cij+ 1  for  every  i  and  j.  For  a  set  of  nodes  B,  if  c mst{B)  is  the  cost  of  a  minimum  span¬ 
ning  tree  (MST)  connecting  the  nodes  in  B  by  their  shortest  paths,  the  weight  of  etj  in  Q'  is 
the  difference  in  costs  of  the  MSTs  of  Cld  and  Cy+i  (or  0  if  this  difference  becomes  negative), 
i.e.,  c(eij)  =  max  [cm5t(CiJ+i)  —  CMST{Cij),0\  .  We  also  associate  a  “reward”  reward(Cjj)  = 

F(Cij)  —  F(Cij- 1)  with  each  node,  where  F(Cit 0)  =  0.  Note  that,  by  telescopic  sum,  the  total 
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reward  of  the  first  k  elements  in  chain  i  is  FiCjj.),  and  the  total  cost  of  the  edges  connecting  them 
is  c MST{Citk),  which  is  at  most  2  times  the  the  cost  of  a  minimum  Steiner  tree  connecting  the  nodes 
in  Citk  in  the  original  graph  Q.  By  property  (i)  of  the  padded  decomposition,  c MST(Cqfc)  <  a  r  k. 
By  associating  these  rewards  with  each  node,  we  define  a  modular  set  function  F'  on  Q' ,  such  that 
for  a  set  B  of  nodes  in  Q' ,  its  value  F'{B)  is  the  sum  of  the  rewards  of  all  elements  in  B.  Figure  4(b) 
presents  an  example  of  a  chain  associated  with  cluster  1  in  Figure  4(a).  Additionally,  we  connect 
every  pair  of  nodes  Cy.Cy  with  an  edge  with  cost  being  the  shortest  path  distance  between  i  and 
<jj  i  in  Q.  This  fully  connected  subgraph  is  called  the  core  of  Q' .  Figure  4(c)  presents  the  modular 
approximation  graph  associated  with  the  padded  decomposition  of  Figure  4(a). 

6.4  Solving  the  covering  and  maximization  problems  in  Q' 

The  modular  approximation  graph  Q'  reduces  the  problem  of  optimizing  a  submodular  set  function 
in  Q  to  one  of  optimizing  a  modular  set  function  F'  (where  the  value  of  a  set  is  the  sum  of  rewards  of 
its  elements)  in  Q'  to  minimize  communication  costs.  This  is  a  well  studied  problem,  and  constant 
factor  approximation  algorithms  have  been  found  for  the  covering  and  maximization  problems.  The 
(rooted)  Quota-MST  problem  asks  for  a  minimum  weight  tree  T  (with  a  specified  root),  in  which 
the  sum  of  rewards  exceeds  the  specified  quota.  Conversely,  the  Budget-MST  problem  desires  a  tree 
of  maximum  reward,  subject  to  the  constraint  that  the  sum  of  edge  costs  is  bounded  by  a  budget. 
The  best  known  approximation  factors  for  these  problems  is  2  for  rooted  Quota-MST  (Garg,  2005), 
and  3  +  e  (for  any  e  >  0)  for  unrooted  Budget-MST  (Levin,  2004).  We  can  use  these  algorithms  to 
get  an  approximate  solution  for  the  covering  and  maximization  problems  in  Q' .  From  Section  6.3, 
we  know  that  it  suffices  to  decide  which  chains  to  connect,  and  how  deep  to  descend  into  each  chain; 
any  such  choice  will  give  a  subtree  of  Q' .  To  find  this  tree,  we  consider  all  C,.  i  for  each  i  as  possible 
roots,  and  choose  the  best  tree  as  an  approximate  solution.  (For  the  Budget-MST  problem,  we  only 
have  an  unrooted  algorithm,  but  we  can  use  the  structure  of  our  modular  approximation  graph  to  get 
an  approximately  optimal  solution.)  Figure  4(d)  illustrates  such  a  Quota-MST  solution. 


6.5  Transferring  the  solution  from  Q'  back  to  Q 

The  Quota-  or  Budget-MST  algorithms  select  a  tree  T  in  Q' ,  which  is  at  most  a  constant  factor 
worse  than  the  optimal  such  tree.  We  use  this  solution  F'  obtained  for  Q'  to  select  a  tree  T  C  Q: 
For  every  cluster  i,  if  Cy  £  T'  we  mark  i, . . . ,  <?y  in  Q.  We  then  select  T  to  be  an  approximately 
optimal  Steiner  tree  connecting  all  marked  nodes  in  Q,  obtained,  e.g.,  by  computing  an  MST  for  the 
fully  connected  graph  over  all  marked  vertices,  where  the  cost  of  an  edge  between  s  and  t  is  the 
shortest  path  distance  between  these  nodes  in  Q.  This  tree  T  is  the  approximate  solution  promised 
in  Theorem  1.  (Figure  4(e)  presents  the  expansion  of  the  Quota-MST  from  Figure  4(d).) 

6.6  Additional  implementation  details 

pSPIEL  relies  heavily  on  the  monotonic  submodularity  and  locality  assumptions.  In  practice,  since 
we  may  not  know  the  constants  r  and  7,  we  run  the  algorithm  multiple  times  with  different  choice 
for  r.  Since  the  algorithm  is  randomized,  we  repeat  it  several  times  to  achieve  a  good  solution  with 
high  probability.  Finally,  since  we  do  not  know  7,  we  cannot  directly  specify  the  desired  quota 
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when  solving  the  covering  problem.  To  alleviate  all  these  intricacies,  we  use  the  following  strategy 
to  select  a  good  placement:  For  a  fixed  number  of  iterations,  randomly  sample  an  r  between  0  and 
the  diameter  of  Q.  Also  sample  a  quota  Q  between  0  and  Qmax,  the  maximum  submodular  function 
value  achieved  by  the  unconstrained  greedy  algorithm.  Run  pSPIEL  with  these  parameters  r  and 
Q,  and  record  the  actual  placement,  as  well  as  the  communication  cost  and  sensing  quality  achieved 
by  the  proposed  placement.  After  N  iterations,  these  values  result  in  a  cost-benefit  curve,  which 
can  be  used  to  identify  a  good  cost-benefit  tradeoff  as  done  in  Section  7. 

Also,  note  that  a  key  step  of  pSPIEL  is  to  run  the  greedy  algorithm  in  each  cluster.  Using  the  tech¬ 
nique  of  lazy  evaluations,  originally  proposed  by  Minoux  (1978)  and  applied  to  mutual  information 
by  Krause  et  al.  (2007),  this  step  can  often  be  drastically  sped  up. 


7  Experiments 


In  order  to  evaluate  our  method,  we  computed  sensor  placements  for  three  real-world  problems: 
Indoor  illumination  measurement,  the  temperature  prediction  task  as  described  in  our  running  ex¬ 
ample,  and  the  prediction  of  precipitation  in  the  United  States’  Pacific  Northwest. 


System  implementation  We  developed  a  complete  system  implementation  of  our  sensor  place¬ 
ment  approach,  based  on  Tmote  Sky  motes.  The  data  collection  from  the  pilot  deployment  is  based 
on  the  TinyOS  SurgeTelos  application,  which  we  extended  to  collect  link  quality  information.  Once 
per  epoch,  every  sensor  sends  out  a  broadcast  message  containing  its  unique  identifier.  Upon  re¬ 
ceipt  of  these  messages,  every  sensor  will  compile  a  bitstring,  indicating  from  which  neighbor  it  has 
heard  in  the  current  epoch.  This  transmission  log  information  will  then  be  transmitted,  along  with 
the  current  sensor  readings,  via  multi-hop  routing  to  the  base  station.  After  enough  data  has  been 
collected,  we  learn  GP  models  for  sensing  quality  and  communication  cost,  which  arc  subsequently 
used  by  the  pSPIEL  algorithm.  Our  implementation  of  pSPIEL  uses  a  heuristically  improved  ap¬ 
proximate  /::-MST  algorithm  as  described  by  Johnson  et  al.  (2000).  Using  pSPIEL,  we  generate 
multiple  placements  and  plot  them  in  a  trade-off  curve  as  described  in  Section  6.6.  We  then  identify 
an  appropriate  trade-off  by  selecting  good  placements  from  this  trade-off  curve. 


Proof-of-concept  study  As  a  proof-of-concept  experiment,  we  deployed  a  network  of  46  Tmote 
Sky  motes  in  the  Intelligent  Workplace  at  CMU.  As  a  baseline  deployment,  we  selected  20  locations 
(M20)  that  seemed  to  capture  the  overall  variation  in  light  intensity.  After  collecting  the  total  solar 
radiation  data  for  20  hours,  we  learned  GP  models,  and  used  pSPIEL  to  propose  a  placement  of  19 
motes  (pS  19).  Figure  5(a)  shows  the  20  and  19  motes  deployments.  After  deploying  the  competing 
placements,  we  collected  data  for  6  hours  starting  at  12  PM  and  compared  the  prediction  accuracy 
for  all  placements,  on  validation  data  from  41  evenly  distributed  motes.  Figure  5(b)  presents  the 
results.  Interestingly,  the  proposed  placement  (pS  19)  drastically  reduces  the  prediction  error  by 
about  50%.  This  reduction  can  be  explained  by  the  fact  that  there  are  two  components  in  lighting: 
natural  and  artificial.  Our  baseline  deployment  placed  sensors  spread  throughout  the  environment, 
and  in  many  intuitive  locations  near  the  windows.  On  the  other  hand,  pSPIEL  decided  not  to 
explore  the  large  western  area,  a  paid  of  the  lab  that  was  not  occupied  during  the  night,  and  thus 
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Metric 

M20 

pS19 

pS12 

RMS 

91.0 

51.2 

71.5 

MAD 

67.0 

31.3 

45.1 

Pred.  c. 

24.4 

19.9 

15.3 

Real  c. 

22.9 

21.8 

15.0 

(b)  Costs  and  prediction  qualities 


(a)  Placements 


Figure  5:  Experimental  results,  (a)  shows  the  expert  placement  (M20)  as  well  as  two  place¬ 
ments  proposed  by  pSPIEL,  (pS  19)  and  (pS  12).  (b)  presents  root-mean-squares  (RMS)  and  mean- 
absolute-deviation  (MAD)  prediction  errors  for  the  manual  placement  and  two  placements  from 
pSPIEL.  (c)  compares  the  cost-benefit  tradeoff  curves  for  the  light  data  GP  on  a  187  points  grid, 
(d)  compares  the  root-mean-squares  error  for  the  light  data. 


had  little  fluctuation  with  artificial  lighting.  Focusing  on  the  eastern  part,  pSPIEL  was  able  to  make 
sufficiently  good  natural  light  predictions  throughout  the  lab,  and  better  focus  of  the  sources  of 
variation  in  artificial  light.  We  repeated  the  evaluation  for  a  12  motes  subsample  (pS  12,  Figure  5(a)), 
also  proposed  by  pSPIEL,  which  still  provides  better  prediction  than  the  manual  placement  of 
20  nodes  (M20),  and  significantly  lower  communication  cost.  We  also  compared  the  predicted 
communication  cost  using  the  GPs  with  the  measured  communication  cost.  Figure  5(b)  shows  that 
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the  prediction  matches  well  to  the  measurement.  Figs.  5(c)  and  5(d)  show  that  pSPIEL  outperforms 
the  Greedy  heuristic  explained  below,  both  in  the  sensing  quality  and  communication  cost  tradeoff 
and  in  predictive  RMS  error. 


Indoor  temperature  measurements  In  our  second  set  of  experiments,  we  used  an  existing  de¬ 
ployment  Figure  1(a))  of  52  wireless  sensor  motes  to  learn  a  model  for  predicting  temperature 
and  communication  cost  in  a  building.  After  learning  the  GP  models  from  five  days  of  data,  we 
used  pSPIEL  to  propose  improved  sensor  placements.  We  compared  pSPIEL  to  three  heuristics, 
and — for  small  problems — with  the  optimal  algorithm  which  exhaustively  searches  through  all  pos¬ 
sible  deployments.  The  first  heuristic,  Greedy-Connect,  runs  the  unconstrained  greedy  algorithm 
(Algorithm  1),  and  then  connects  the  selected  sensors  using  a  Steiner  tree  approximation.  The  sec¬ 
ond  heuristic.  Distance-weighted  Greedy,  is  inspired  by  an  algorithm  that  provides  near-optimal 
solutions  to  the  Quota-MST  problem  (Awerbuch  et  al.,  1999).  This  heuristic  initially  starts  with  all 
nodes  in  separate  clusters,  and  iteratively  merges  -  using  the  shortest  path  -  clusters  maximizing 
the  following  greedy  criterion: 


gain  (Ci,  C2) 


min.i6ii2(F(Ci  U  C2)  -  F(Cj)) 
dist(Ci,C2) 


The  intuition  for  this  greedy  rule  is  that  it  tries  to  maximize  the  benefit-cost  ratio  for  merging  two 
clusters.  Since  it  works  near-optimally  in  the  modular  case,  we  would  hope  it  performs  well  in  the 
submodular  case  also.  The  algorithm  stops  after  sufficiently  large  components  are  generated  (c.f, 
Awerbuch  et  ah,  1999).  We  also  compare  against  the  Information  Driven  Sensor  Querying  (IDSQ) 
approach  (Zhao  and  Guibas,  2004).  In  IDSQ,  a  leader  node  si  is  elected  that  is  able  to  sense  the 
monitored  phenomenon.  Subsequently,  sensors  Sj  arc  greedily  selected  for  communication  with 
s i ,  to  maximize  a  linear  combination  of  incremental  utility  and  communication  cost  to  the  leader 
node, 


sj  €  argmaxa[F({si, . . .  ,s,-_i,s})  -  F({si, . . . ,  s,-_i})]  -  (1  -  a)  c({si,s}). 

S 

Hereby,  a  is  a  parameter  varying  between  0  and  1  controlling  the  cost-benefit  tradeoff.  Selection 
stops  when  no  positive  net-benefit  can  be  achieved.  Since  all  nodes  can  sense  the  phenomenon,  we 
elect  the  leader  node  s\  at  random  and  report  expected  cost  and  sensing  quality  over  10  random 
trials. 

Figure  6(a)  compares  the  performance  of  pSPIEL  with  the  other  algorithms  on  a  small  problem 
with  only  16  candidate  locations.  We  used  the  empirical  covariance  and  link  qualities  measured 
from  16  selected  sensors.  In  this  small  problem,  we  could  explicitly  compute  the  optimal  solution 
by  exhaustive  search.  Figure  6(a)  indicates  that  the  performance  of  pSPIEL  is  significantly  closer 
to  the  optimal  solution  than  any  of  the  other  approaches.  Figure  6(b)  presents  a  comparison  of 
the  algorithms  for  selecting  placements  on  a  10  x  10  grid.  We  used  our  GP  models  to  predict  the 
covariance  and  communication  cost  for  this  discretization.  From  Figure  6(b)  we  can  see  that  for  very 
low  quotas  (less  than  25%  of  the  maximum),  the  algorithms  performed  very  similarly.  Also,  for  very 
large  quotas  (greater  than  80%),  pSPIEL  does  not  significantly  outperform  not  Greedy-Connect, 
since,  when  the  environment  is  densely  covered,  communication  is  not  an  issue.  In  fact,  if  the 
information  quota  requires  a  very  dense  deployments,  the  padded  decomposition  tends  to  strip  away 
many  nodes,  leading  pSPIEL  to  increase  the  locality  constant  r,  until  r  is  large  enough  to  include 
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(d)  (r,  7)  for  temperature  data 


Figure  6:  Experimental  results,  (a)  compares  trade-off  curves  for  a  small  subset  of  the  temperature 
data,  (b)  shows  tradeoff  curves  for  the  temperature  GPs  on  a  10x10  grid,  (c)  compares  tradeoffs  for 
precipitation  data  from  167  weather  stations,  (d)  compares  the  locality  parameter  r  and  the  loss  7 
incurred  by  the  modular  approximation  for  the  temperature  GPs. 


all  nodes  are  in  a  single  cluster.  In  this  case,  pSPIEL  essentially  reverts  back  to  the  Greedy-Connect 
algorithm.  In  the  important  region  between  25%  and  80%  however,  pSPIEL  clearly  outperforms  the 
heuristics.  Our  results  also  indicate  that  in  this  region  the  steepest  drop  in  out-of-sample  root  mean 
squares  (RMS)  prediction  accuracy  occurs.  This  region  corresponds  to  placements  of  approximately 
10  —  20  sensors,  an  appropriate  number  for  the  target  deployment  Figure  1(a). 

In  order  to  study  the  effect  of  the  locality  parameter  r,  we  generated  padded  decompositions  for 
increasing  values  of  r.  For  random  subsets  of  the  padded  nodes,  and  for  placements  from  pSPIEL, 
we  then  compared  the  modular  approximation,  i.e.,  the  sum  of  the  local  objective  values  per  cluster, 
with  the  mutual  information  for  the  entire  set  of  selected  nodes.  As  r  increases  to  values  close  to 
2,  the  approximation  factor  7  drastically  increases  from  .3  to  .7  and  then  flattens  as  r  encompasses 
the  the  entire  graph  Q.  This  suggests  that  the  value  r  =  2  is  an  appropriate  choice  for  the  locality 
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parameter,  since  it  only  incurs  a  small  approximation  loss,  but  guarantees  small  diameters  of  the 
padded  clusters,  thereby  keeping  communication  cost  small.  For  placements  proposed  by  pSPIEL, 
the  approximation  factor  is  even  better. 

Precipitation  data  In  our  third  application,  our  goal  was  to  place  sensors  for  predicting  precip¬ 
itation  in  the  Pacific  North-West.  Our  data  set  consisted  of  daily  precipitation  data  collected  from 
167  regions  during  the  years  1949-1994  (Widmann  and  Bretherton,  1999).  We  followed  the  pre¬ 
processing  from  Krause  et  al.  (2007).  Since  we  did  not  have  communication  costs  for  this  data  set, 
we  assumed  that  the  link  quality  decayed  as  the  inverse  square  of  the  distance,  based  on  physical 
considerations.  Figure  6(c)  compares  the  sensing  quality  -  communication  cost  tradeoff  curves  for 
selecting  placements  from  all  167  locations.  pSPIEL  outperforms  the  other  approaches  up  to  very 
large  quotas. 


8  Robust  Sensor  Placements 


In  Section  7,  we  have  seen  that  optimized  placements  can  lead  to  much  higher  prediction  accuracy 
and  lower  communication  cost  as  compared  to  manual  placements.  However,  such  optimization  can 
lead  to  negative  effects  if  the  model  that  the  optimization  is  based  on  changes.  For  example,  in  our 
proof-of-concept  study,  it  is  conceivable  that  the  building  usage  patterns  change,  and  the  western 
part  of  the  building  becomes  occupied.  In  this  case,  the  optimized  placement  will  fail  to  capture 
important  variations  in  light  distribution.  Intuitively,  the  manual  placement  (M20)  should  be  able  to 
capture  such  change  of  the  environment  better,  as  the  sensors  arc  more  uniformly  spread  out.  This 
intuitive  assessment  comes  from  our  prior  assumptions  that,  since  lights  spreads  uniformly  over 
space,  a  regularly-spaced  distributed  placement  should  be  quite  informative. 

How  can  we  place  sensors  that  perform  well  both  according  to  the  current  state  of  the  world,  as 
well  as  to  possible  future  changes?  One  possibility  is  to  require  the  sensor  placement  to  perform 
well  both  according  to  our  prior  assumptions  (i.e.,  favoring  uniform  placements)  and  to  the  data  we 
collected  so  far.  We  can  formalize  this  idea  by  defining  two  separate  sensing  quality  functions,  F\ 
and  F->.  F\  (A)  measures  the  informativeness  of  placement  A  under  the  isotropic  prior.  Fz  measures 
the  informativeness  according  to  the  collected  data,  as  described  before.  Assuming  a  priori  that  the 
phenomenon  will  always  uniformly  spread  in  the  environment,  we  could  choose  F\  to  be  the  mutual 
information  of  an  isotropic  Gaussian  process,  as  in  Equation  (7).  Optimizing  according  to  F\  only 
would  lead  to  sensor  placements  that  are  (asymptotically)  distributed  on  a  regular  grid,  and  we 
would  hence  expect  such  placements  to  be  robust  against  changes  in  the  environment.  F2  is  the 
mutual  information  according  to  the  complex,  data-dependent,  nonstationary  Gaussian  process  as 
considered  in  the  earlier  parts  of  this  paper.  Optimizing  for  F2  would  lead  to  placements  that  exploit 
the  correlations  in  the  data  collected  from  the  pilot  deployment.  Based  on  these  two  objective 
functions,  we  would  then  like  to  find  a  placement  A  that  jointly  optimizes  F\  and  F2,  i.e.,  which  is 
both  robust,  and  exploits  correlations  estimated  from  data. 

More  generally,  we  would  like  to  solve  the  robust  optimization  problem 

rmn  c(A)  such  that  for  all  i,Fi(A)  >  Q ,  (9) 
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where  F) .. . . .  Fm  is  a  collection  of  monotonic  submodular  functions.  Note  that,  unlike  problem 
(1),  in  problem  (9)  there  are  now  multiple  submodular  constraints  F\  {A)  >  Q,  ■  ■  ■ ,  Fm(A)  >  Q. 
The  following  key  idea  allows  us  to  reduce  problem  (9)  to  problem  (1):  For  each  function  Ft,  define 
a  new  truncated  objective  function 

Fi,q{A)  =  imn{Fj(^4),Q}. 

It  holds  that  whenever  Ft  is  monotonic  and  submodular,  F,  q  is  monotonic  and  submodular  as 
well  (Fujito,  2000).  As  a  nonnegative  linear  combination  of  monotonic  submodular  functions,  the 
function 

tq(a)  =  2  Yi  Tq(A) 

i 

is  monotonic  submodular  as  well.  Furthermore,  it  holds  that  Fq(A)  =  Q  if  and  only  if  Ft(A)  >  Q 
for  all  i.  Hence,  instead  of  problem  (9),  we  can  equivalently  solve3 

mine  (.4)  such  that  Fq(A)  >  Q , 

which  is  an  instance  of  problem  (1).  However,  we  cannot  readily  apply  pSPIEL  to  this  problem, 
since  it  is  only  guaranteed  to  return  a  solution  such  that,  in  expectation,  Fq(A)  >  /3Q,  for  /3  = 
(1  — l/e)7/2  (from  Theorem  1).  Unfortunately,  Fq( A)  >  (iQ  docs  not  guarantee  that  F,(A)  >  (3Q 
for  each  i. 

However,  we  can  nevertheless  use  pSPIEL  to  solve  problem  (9).  We  first  present  an  overview  of 
our  algorithm,  and  then  discuss  the  details  in  Section  8.1. 

1.  We  first  convert  pSPIEL  into  an  algorithm,  for  which  Theorem  1  holds  not  just  in  expectation, 
but  with  high  probability.  For  arbitrary  5  >  0,  this  new  algorithm  will  be  guaranteed  to 
provide,  with  probability  at  least  1  —  6,  a  solution  A  such  that  F q(A)  >  (3Q/2. 

2.  Call  Qtogo  =  Q  —  F q(A)  the  remaining  quota  that  still  needs  to  be  covered.  When  applying 
pSPIEL  once,  Qtogo  <  Q(  1  —  /?/2).  We  show  how  we  can  iteratively  apply  pSPIEL  to 
a  modified  objective  function  to  obtain  larger  and  larger  sets  A  such  that  after  k  iterations 
Qtogo  <  Q(  1  —  (3/ 2)k.  Hence,  after  logarithmically  many  iterations,  Qtogo  <  e/m.  This 
implies  that  Ft(A)  >  Q(  1  —  e)  for  all  i. 


8.1  Algorithm  details 


We  first  need  to  turn  pSPIEL  into  an  algorithm  that  solves  problem  (1)  with  high  probability  1  —  6. 
Let  M  be  an  upper  bound  on  the  optimal  value.  Then  each  run  of  pSPIEL  returns  a  solution  At 
with  0  <  F(Ai)  <  M. 

Lemma  3.  We  need 


N  = 


1 

2 


1 

6 


samples  to  guarantee  a  sample  At  with  Fq(A{ )  >  7?  with  probability  1 


<5. 


3  A  similar  construction  was  used  by  Krause  et  at.  (2007)  to  develop  the  SATURATE  algorithm  for  robust  optimization 
of  submodular  functions. 
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In  order  to  guarantee  that  min,;  FfA,)  >  (1  —  e)Q,  we  follow  the  following  strategy:  Let  F(  1 '  = 
Fq.  We  invoke  random  sampling  of  pSPIEL  applied  to  T1!1)  until  we  obtain  a  solution  A\  such  that 
>  ^2.  We  then  define  a  new  monotonic  submodular  function,  F^2\A)  =  Fq(AuAi)  — 
Fq(A  \  ).  F'-J  is  called  a  residual  submodular function.  We  then  repeatedly  run  pSPIEL  to  obtain 
a  solution  A-2  such  that  Fly2\A2 )  >  .  After  k  steps,  we  define  the  function 

F^k+1\A)  =  Fq(A  U  A  U  ■  ■  ■  U  Ak)  -  Fq{A i  U  •  ■  ■  U  Ak), 


and  use  pSPIEL  to  obtain  a  solution  such  that 


F(fc+1)(Afc+1)  > 


f3(Q-  FQ{AiU---uAk)) 
2 


Note  that  after  k  steps,  it  holds  that  ( Q  —  Fq(A\  U  •  •  •  U  Ak)  <  <3(1  —  \(3)k>  and  hence  after 


logf 

!og 


iterations  it  holds  that 

Fq{Ai  u  •  •  •  u  Ak)  >Q{  1-— ), 

m 

and  hence  Ft(A\  U  •  •  •  U  Ak)  >  Q{  1  —  e)-  We  choose  5  small  enough  to  apply  the  union  bound 
over  all  k  trials.  Algorithm  3  presents  pseudo-code  for  our  approach. 


We  summarize  our  analysis  in  the  following  Theorem: 

Theorem  4.  Given  a  graph  Q  =  (V,  E),  constants  £,  S,  Q  and  (r,  7 ) -local  monotone  submodular 
functions  F\. . . . .  Fm  bounded  above  by  M,  we  can  find  a  tree  T  with  cost 


c(T) 


0(r  dim(V,£))  x 


logf 


x  t 


spanning  a  set  A  with  Fi(A)  >  Q(  1  —  e)for  all  i.  The  algorithm  is  randomized  and  runs  in  expected 
time  polynomial  in  the  size  of  the  problem  instance  and  polynomial  in  M-.  □ 


Hence,  for  an  arbitrary  e  >  0  we  can,  in  expected  polynomial  time,  find  a  sensor  placement  with 
sensing  quality  FfA)  >  Q(  1  —  e).  The  cost  of  the  solution  c  (T)  grows  logarithmically  in  —  which 
depends  on  the  number  m  of  objective  functions. 


8.2  Experiments 

We  use  our  robust  version  of  pSPIEL  to  make  the  sensor  placement  in  our  proof-of-concept  study 
more  robust.  We  choose  F\  (A)  as  the  mutual  information  obtained  in  an  isotropic  Gaussian  process 
with  kernel  (7)  and  fixed  bandwidth  h.  As  F2,  we  choose  the  nonstationary  GP  learned  from  the 
pilot  deployment,  as  in  Section  7. 

In  order  to  model  F\  using  an  isotropic  GP  (modeling  the  uniform  spreading  of  light),  we  need 
to  specify  the  bandwidth  parameter  h  in  (7).  This  bandwidth  parameter  encodes  our  smoothness 
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Input:  Graph  (V,  E),  Fi, . . . ,  Fm,  Q,  M,  (3,  e,  5 

Output:  Placement  A  such  that  F,{A)  >  Q  for  all  i  with  probability  1  —  5. 

begin 

Fq{A)  <-  £  E™  i  min{F;(„4),  Q}; 

F(1)  <-  Fq- 
A^k 
k  v—  0; 

while  Fq(A)  <  Q(  1  —  s/m)  do 

k  <—  k  +  1; 

F(fc)(^)  FqM'  U  .4)  -  Fq(4); 

Qt-ogo  <  Q  ~  Fq(A)', 

repeat 

A!  <—  pSPIEL(F(k\Qtogo) 
until  F(fe)(„4/)  >  Qtogo/3/ 2  ; 

<-  U  ^4'; 

end 

end 

Algorithm  3:  Algorithm  for  robust  sensor  placements. 


assumptions  about  the  world.  The  smaller  h,  the  quicker  the  assumed  correlation  decays  with  dis¬ 
tance,  and  hence  the  more  rough  we  assume  the  phenomenon  to  be.  In  addition,  the  smaller  h,  the 
more  sensors  we  need  in  order  to  obtain  a  high  level  of  mutual  information.  This  means,  that  for 
small  values  of  h,  the  uninformed  sensing  quality  F\  dominates  the  optimization.  For  large  values 
of  It  however,  the  fewer  sensors  we  need  to  obtain  high  sensing  quality  F\ ,  and  hence  F>  dominates 
the  objective  value.  Flence,  by  varying  h,  we  can  vary  the  amount  of  robustness.  Figure  7  shows 
different  sensor  placements  obtained  by  choosing  an  increasing  bandwidth  h.  For  small  bandwidths, 
the  placements  arc  basically  uniformly  spread  out  over  the  space.  For  high  bandwidths,  the  robust 
places  resemble  the  non-robust  placement  pS19  (from  Section  7). 


We  also  compare  the  manual  placement  (M20)  and  the  non-robust  pSPIEL  placement  (pS  19)  of 
Section  7  with  the  robust  solutions.  For  each  robust  placement  A,  we  compute  min,;  Ft{A),  which 
measures  how  well  the  placement  performs  with  respect  to  both  the  data-driven  model  F>  and  the 
uniform  prior  F\ .  The  placement  in  Figure  7(b)  maximizes  this  score  over  all  the  robust  placements, 
indicating  that  Figure  7(b)  is  a  good  compromise  between  the  data-driven  model  and  prior  assump¬ 
tions.  Figure  8  compares  the  manual  placement  (M20)  with  both  optimized  placements.  Note  that 
while  the  robust  placement  obtains  higher  RMS  error  and  communication  cost  than  the  non-robust 
placement  (pS19),  it  still  performs  drastically  better  than  the  manual  placement  (M20).  Also  note 
that  the  robustness  scores  min,  Ft(A)  of  both  the  robust  and  the  manual  placement  are  higher  than 
for  the  non-robust  placement  (pS  19). 
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Figure  7 :  Experimental  results,  (a)  shows  the  non-robust  placement,  (b-e)  show  placements  opti¬ 
mized  using  Algorithm  3  where  F2  is  the  mutual  information  according  to  the  pilot  deployment, 
and  Fi  is  the  mutual  information  w.r.t.  an  isotropic  GP  with  different  bandwidths:  pS/S  1 9-x  refers 
to  the  result  when  using  a  bandwidth  proportional  to  x.  Notice  that  with  decreasing  bandwidths,  the 
placements  become  more  and  more  regularly-spaced. 


9  Modular  approximation  for  other  combinatorial  problems:  Infor¬ 
mative  Path  Planning 


The  key  idea  behind  pSPIEL  was  to  reduce  the  problem  of  maximizing  sensing  quality,  a  submod- 
ular  function,  to  the  problem  of  maximizing  a  modular  function  on  the  Modular  Approximation 
Graph,  which  we  can  solve  using  existing  combinatorial  algorithms  for  modular  functions.  This 
idea  of  reducing  a  local-submodular  optimization  to  a  modular  optimization  is  quite  powerful.  For 
example,  we  can  use  the  same  algorithmic  idea  to  solve  other  local-submodular  combinatorial  op¬ 
timization  problems.  In  the  following,  we  will  discuss  one  such  example:  Applying  pSPIEL  for 
informative  path  planning. 

Consider  the  setting  where,  instead  of  deploying  a  wireless  sensor  network,  we  use  a  robot  to  collect 
observations.  In  this  case,  we  also  want  to  identify  locations  A  of  high  sensing  quality,  but  the  robot 
needs  to  travel  between  successive  sensing  location.  We  can  model  this  setting  by  specifying  a 
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Figure  8:  Experimental  results  comparing  the  manual  placement  with  20  sensors  (left)  to  the  robust 
(center)  and  non-robust  (right)  placements  obtained  using  pSPIEL.  For  each  placement,  three  bars 
are  shown,  measuring  improvement  in  RMS  error,  communication  cost  and  robustness  compared 
to  the  manual  solution  as  baseline  (which  is  normalized  to  1).  Higher  values  are  better.  Both 
the  robust  and  non-robust  placements  outperform  the  manual  deployment  in  communication  cost 
and  prediction  error.  The  robust  placement  also  obtains  higher  robustness  score  than  the  manual 
deployment.  The  robustness  score  of  the  non-robust  placement  is  even  lower  than  that  of  the  manual 
placement. 


graph  Q  =  (V,  E )  over  the  sensing  locations,  and  our  goal  will  be  to  find  an  informative  path  V  = 
(ai , . . . ,  a*;)  spanning  the  locations  A  C  'P.  In  this  setting,  rather  than  modeling  the  communication 
cost,  the  cost  c(V)  is  the  length  of  the  path  V,  i.e., 


fc-i 

c(V)  =  5>(Kam}). 

i— 1 

Similarly,  c(*4.)  is  the  cost  of  the  shortest  path  spanning  nodes  A.  Using  this  modified  notion  of 
cost,  we  designate  specific  nodes  s,  t  £  V  as  starting  and  ending  locations,  i.e.,  require  that  a\  =  s 
and  «/.  =  t,  and  solve 

maxF('P)  subject  to  c(V)  <  B  and  V  is  an  s  —  t  path  in  Q  (10) 

For  modular  functions  F,  this  problem  is  known  as  the  s  —  t  orienteering  problem  ( c.f ,  Chekuri 
et  ah,  2008).  For  the  more  general  case  where  F  is  submodular,  so  far  for  this  problem  only  an 
algorithm  with  quasipolynomial  running  time  was  proposed  by  Chekuri  and  Pal  (2005),  which  has 
been  extended  and  used  for  informative  path  planning  by  Singh  et  al.  (2007). 

Using  an  approximate  algorithm  for  s  —  t  orienteering  on  the  modular  approximation  graph,  we 
obtain  the  first  polynomial  time  approximation  algorithm  for  (r,  7) -local  submodular  orienteer¬ 
ing. 

Theorem  5.  Given  a  graph  Q  =  (V.  F),  s,t  G  V  and  an  [r,^)-local  monotone  submodular  function 
F,  pSPIEF  will  find  an  s  —  t  path  V  with  cost  0(r  dim(V.U))  X  £*,  spanning  a  set  A  with  expected 
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sensing  quality  F(A)  >  0(7)  X  F(A*).  The  algorithm  is  randomized  and  runs  in  polynomial- 
time.  □ 

Singh  et  al.  (2007)  proved  that  any  ^-approximate  algorithm  for  submodular  orienteering  can  be  ex¬ 
tended  to  an  efficient  n  + 1 -approximate  algorithm  for  the  more  complex  problem  planning  multiple 
paths  (for  multiple  robots).  This  result  can  immediately  be  used  to  extend  pSPIEL  to  the  setting  of 
multiple  robot  informative  path  planning. 


10  Related  work 


In  this  section,  we  relate  our  approach  to  work  in  several  areas. 


10.1  Sensor  placement  to  monitor  spatial  phenomena 

The  problem  of  selecting  observations  for  monitoring  spatial  phenomena  has  been  investigated  ex¬ 
tensively  in  geostatistics  (c.f,  Cressie  (1991)  for  an  overview),  and  more  generally  (Bayesian)  exper¬ 
imental  design  (c.f.,  Chaloner  and  Verdinelli,  1995).  Heuristics  for  actively  selecting  observations  in 
GPs  in  order  to  achieve  high  mutual  information  have  been  proposed  by  Caselton  and  Zidek  (1984). 
Submodularity  has  been  used  to  analyze  algorithms  for  placing  a  fixed  set  of  sensors  (Krause  et  ah, 
2007).  These  approaches  however  do  not  consider  communication  cost  as  done  in  this  paper.  The 
problem  of  optimally  placing  a  small  number  of  relay  nodes  to  facilitate  communication  between 
sensors  of  a  deployed  network  has  been  studied  by  a  number  of  researchers  (Ergen  and  Varaiya, 
2006;  Lloyd  and  Xue,  2007;  Cheng  et  ah,  2008).  However,  these  approaches  do  not  jointly  optimize 
over  the  sensor  placement  and  the  deployment  of  relay  nodes  as  considered  in  this  paper. 


10.2  Sensor  placement  under  communication  constraints 

Existing  work  on  sensor  placement  under  communication  constraints  (Gupta  et  ah,  2003;  Kar  and 
Banerjee,  2003;  Funke  et  ah,  2004)  has  considered  the  problem  mainly  from  a  geometric  perspec¬ 
tive:  Sensors  have  a  fixed  sensing  region,  such  as  a  disc  with  a  certain  radius,  and  can  only  com¬ 
municate  with  other  sensors  that  arc  at  most  a  specified  distance  apart.  In  addition,  it  is  assumed 
that  two  sensors  at  fixed  locations  can  either  perfectly  communicate  or  not  communicate  at  all.  As 
argued  in  Section  1  these  assumptions  arc  problematic.  Sensor  selection  considering  both  the  value 
of  information  together  with  the  cost  of  acquiring  the  information  in  the  context  of  sensor  networks 
was  first  formalized  by  Zhao  et  ah  (2002).  Their  Information  Driven  Sensor  Querying  (IDSQ)  ap¬ 
proach  greedily  trades  off  sensing  quality  and  communication  cost.  While  their  approach  flexibly 
accommodates  different  sensing  quality  and  communication  cost  functions,  their  optimization  al¬ 
gorithm  does  not  provide  any  performance  guarantees.  Bian  et  ah  (2006)  describe  an  approach  for 
selecting  sensors  with  submodular  and  supermodular  utility  functions,  trading  off  utility  and  cost. 
However,  their  approach  requires  that  sensors  are  able  to  send  “fractional”  amounts  of  information. 
While  this  fractional  selection  can  be  realistic  in  some  applications,  it  cannot  be  used  to  optimize 
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sensor  placements:  In  sensor  placement,  a  location  is  either  selected  or  not  selected.  The  first  ver¬ 
sion  of  this  paper  (Krause  et  al.,  2006)  was,  to  the  best  of  our  knowledge,  the  first  approach  to 
near-optimally  place  sensor  networks  under  realistic  models  for  both  the  monitored  phenomenon 
and  the  wireless  link  quality.  In  contrast  to  previous  approaches,  pSPIEL  applies  for  all  (r,  7)-local 
submodular  sensing  quality  functions.  The  present  version  is  significantly  extended,  providing  more 
details  as  well  as  new  empirical  and  theoretical  results  (Sections  8  and  9). 


10.3  Statistical  models  for  modeling  link  quality 

Often,  it  is  assumed  that  a  transmitting  node  has  perfect  connectivity  with  all  nodes  inside  a  given 
radius  and  zero  connectivity  with  nodes  outside  that  disk  (Cerpa  et  al.,  2005;  Bai  et  al.,  2006). 
However,  depending  on  how  the  disk  radius  is  chosen,  such  disk  models  may  not  capture  actual 
communication  patterns  in  one  particular  network.  In  order  to  allow  more  flexibility,  Cerpa  et  al. 
(2005)  consider  a  data-driven,  probabilistic  link  model.  Like  the  regular-  disk  model,  it  assumes 
connectivity  is  a  function  only  of  geometric  distance  between  motes  but  unlike  that  model,  it  can 
predict  a  real- valued  connectivity  value,  that  is,  a  probability  of  packet  reception  that  is  not  zero 
or  one.  However,  their  isotropic  approach  does  not  adapt  to  a  specific  environment  (containing 
obstacles  like  walls,  doors,  furniture,  etc.).  Cerpa  et  al.  (2005)  also  study  the  impact  of  temporal 
autocorrelation  on  routing  decisions.  Incorporating  such  temporal  aspects  into  sensor  placement 
optimization  is  an  interesting  avenue  for  further  research. 

In  order  to  account  for  more  complex  behavior,  physical  models  like  radio  propagation  or  path  loss 
equations  were  obtained  from  real-data  and  describes  the  signal  quality  fall  off  away  from  a  trans¬ 
mitting  sensor  (Friis,  1946;  Rappaport,  2000;  Zuniga  and  Krishnamachari,  2007).  These  equations 
can  model  complex  communication  behaviors  with  parameters  encoding  for  the  number  of  walls, 
the  construction  materials  of  the  clutter,  multipath  signal  effects,  and  microwave  interference  (Rap¬ 
paport,  2000;  Morrow,  2004).  Unfortunately,  this  deployment-specific  information  can  be  as  hard 
to  model  and  obtain  as  the  packet  transmission  data  needed  for  data-driven  approaches. 

Our  link  quality  model  described  in  Section  2.2  allows  to  both  model  complex,  environment  depen¬ 
dent  behavior,  and  is  completely  data  driven  (i.e.,  no  deployment-specific  information  needs  to  be 
manually  supplied).  After  the  first  version  of  our  paper  was  published  (Krause  et  al.,  2006),  Ertin 
(2007)  proposed  an  approach  for  learning  Gaussian  Process  models  for  link  quality  estimation,  ex¬ 
plicitly  taking  into  account  that  fact  that  sensor  measurements  are  lost  (censored).  Note  that  our 
pSPIEL  approach  can  use  such  alternative  approaches  for  estimating  link  quality  as  well. 


10.4  Related  work  on  submodular  optimization 

Problem  (1)  for  an  arbitrary  integer  valued  monotonic  submodular  function  F  is  called  the  poly- 
matroid  Steiner  tree  problem  (Calinescu  and  Zelikovsky,  2005).  Calinescu  and  Zelikovsky  (2005) 
developed  a  polylogarithmic  approximation  algorithm  for  this  problem.  However,  their  approach 
does  not  exploit  locality,  and  hence  leads  to  approximation  guarantees  that  are  worse  than  those 
obtained  by  pSPIEL  (which  solves  the  problem  for  all  (r,  7)-local  submodular  functions  F)  if 
locality  is  present.  The  submodular  orienteering  problem,  i.e.,  the  problem  of  finding  a  path  of 
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bounded  length  maximizing  a  submodular  utility  function,  was  first  considered  by  Chekuri  and  Pal 
(2005),  who  developed  an  algorithm  with  quasipolynomial  running  time.  While  providing  impor¬ 
tant  theoretical  insights,  their  approach  does  not  scale  to  practical  sensing  problems.  Singh  et  al. 
(2007)  proposed  a  spatial  decomposition  approach  as  well  as  branch  and  bound  techniques  to  sig¬ 
nificantly  speed  up  the  approach  of  Chekuri  and  Pal  (2005).  They  also  applied  it  to  informative  path 
planning  in  the  context  of  environmental  monitoring  problems.  However,  their  approach  still  has 
worst-case  quasipolynomial  running  time.  The  approach  presented  in  Section  9  is  the  first  efficient 
(polynomial-time)  algorithm  for  submodular  orienteering,  in  the  case  where  the  objective  function 
F  is  (r,  7) -local. 

The  robust  sensor  placement  problem  (9)  was  first  studied  by  Krause  et  al.  (2007)  for  the  case  of 
finding  the  best  k  sensor  locations.  In  this  paper,  we  extend  their  approach  to  more  complex  cost 
functions  (such  as  communication  cost). 

11  Conclusions 

We  proposed  a  unified  approach  for  robust  placement  of  wireless  sensor  networks.  Our  approach 
uses  Gaussian  Processes,  which  can  be  chosen  from  expert  knowledge  or  learned  from  an  initial 
deployment.  We  propose  to  use  GPs  not  only  to  model  the  monitored  phenomena,  but  also  for 
predicting  communication  costs.  We  presented  a  polynomial  time  algorithm  -  pSPIEL-  select¬ 
ing  Sensor  Placements  at  Informative  and  cost-Effective  Locations.  Our  algorithm  provides  strong 
theoretical  performance  guarantees.  Our  algorithm  is  based  on  a  new  technique,  the  modular  ap¬ 
proximation  graph,  that  is  more  general  and  can  also  be  used,  for  example,  to  plan  informative  paths 
for  robots.  pSPIEL  also  applies  more  generally  to  arbitrary  (r,  7) -submodular  sensing  quality  func¬ 
tions,  and  any  communication  model  where  the  cost  of  a  sensor  deployment  can  be  formalized  as 
the  sum  of  edge  costs  connecting  the  sensors.  We  extended  our  pSPIEL  approach  to  obtain  sensor 
placements  that  arc  robust  against  changes  in  the  environment.  We  built  a  complete  implementation 
on  Tmote  Sky  motes  and  extensively  evaluated  our  approach  on  real-world  placement  problems. 
Our  empirical  evaluation  shows  that  pSPIEL  significantly  outperforms  existing  methods. 


Future  work  While  our  approach  applies  to  a  variety  of  sensor  placement  problems,  there  arc 
several  open  questions  that  we  leave  as  interesting  directions  for  future  work.  For  example,  it 
would  be  interesting  to  investigate  whether  more  general  notions  of  communication  cost  can  be 
incorporated.  When  taking  into  account  interference  between  sensors,  the  communication  cost  does 
not  only  depend  on  pairwise  distances  between  sensors,  but  on  the  density  of  sensors  deployed 
in  particular  areas.  Another  interesting  direction  is  the  incorporation  of  temporal  effects,  using 
spatiotemporal  models  for  the  observed  phenomenon.  Lastly,  an  interesting  algorithmic  question  is 
whether  the  assumption  of  locality  can  be  relaxed. 
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APPENDIX 


of  Lemma  2.  Given  a  collection  of  weights  V  =  {ps  ■  S  C  B},  we  write  E(V)  =  ^2scbPS'-^(^)- 
Note  that  E[F(A)]  =  E(V 0)  for  V0  =  {Pr  [A  =  S]  :  S  C  B}. 

Starting  with  the  set  of  weights  Tq,  we  iteratively  apply  the  following  “uncrossing”  procedure.  As 
long  as  there  is  a  pair  of  sets  S,T  C  B  such  that  neither  of  S  or  T  is  contained  in  the  other,  and  ps, 
Pt  >  0,  we  subtract  x  =  min (ps,Pt)  from  both  ps  and  pj,  and  we  add  x  to  both  pscr  and  psuT- 
Note  the  following  properties  of  this  procedure. 

(i)  The  quantity  YIscb  Ps  rema>ns  constant  over  all  iterations. 

(ii)  For  each  element  IeS,  the  quantity  ffscp-  xeS  Ps  remains  constant  over  all  iterations, 

(iii)  The  quantity  ^2scbPs\S\2  strictly  increases  every  iteration. 

(iv)  By  the  submodularity  of  F,  the  quantity  E(V)  is  non-increasing  over  the  iterations. 

By  (i)  and  (iii),  this  sequence  of  iterations,  starting  from  Vo,  must  terminate  at  a  set  of  weights  V* . 
At  termination,  the  sets  S  on  which  ps  >  0  must  be  totally  ordered  with  respect  to  inclusion,  and 
by  (ii)  it  follows  that  pg  >  p.  Finally,  by  (iv),  we  have 

E[F(A)}  =  E{V 0)  >  E(V*)  >  PF(B),  (11) 

as  required.  □ 

In  order  to  prove  Theorems  1  and  5,  let  us  consider  the  subset  A*  spanned  by  the  optimal  tree 
(or  path),  and  let  A*  C  A*  denote  its  r-padded  nodes  with  respect  to  a  random  partition  drawn 
from  the  padded  decomposition.  (Recall  that  each  node  is  r-padded  with  probability  at  least  p.) 
Now  Lemma  2  implies  that  F(A*),  the  expected  value  of  the  nodes  in  A  that  are  r-padded,  is  at 
least  pF(A*).  The  algorithm  is  based  on  the  idea  of  trying  to  build  a  tree  (a  path)  that  recoups  a 
reasonable  fraction  of  this  “padded  value”. 

The  following  lemma  will  be  useful  in  converting  subtrees  and  paths  of  Q'  back  to  solutions  of  our 
original  problem. 
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Proposition  6.  Given  any  subtree  T'  or  path  V  of  Q'  spanning  nodes  A  with  total  weight  W 
containing  at  least  one  cluster  center,  it  is  possible  to  find  a  subtree  T  C  Q  resp.  path  V  C 
Q  spanning  the  same  vertices  A! ,  with  a  total  length  no  more  than  £{T')  resp.  £{V'),  and  with 
F{A! )  >7  W. 


Proof.  Each  edge  of  Q'  (and  hence  of  T')  corresponds  to  some  shortest  path  in  Q,  and  we  can  add 
all  these  paths  together  to  form  a  connected  subgraph.  Let  T  be  any  spanning  tree  of  this  subgraph; 
clearly,  its  length  is  no  more  than  l(T').  If  ly  C  P%  is  the  subpath  of  Pi  contained  in  T' .  then  the 
total  weight  of  these  vertices  V(P')  is  exactly  the  total  submodular  value  F(V(P[)),  just  by  the 
definition  the  weights.  Furthermore,  since  each  pair  of  distinct  paths  are  at  distance  at  least  r  from 
each  other,  the  locality  property  assures  that  the  value  of  their  union  is  at  least  7  W.  For  paths,  just 


observe  that  expanding  edges  of  V  in  Q  results  in  another  path  V.  □ 

Proposition  7.  If  the  graph  Q  contains  a  subtree  T *  spanning  nodes  A*,  of  length  c (T*)  =  l*  and 
value  F(A*),  then  there  is  a  subtree  T  of  the  graph  Q’  that  has  length  at  most 

tx{a{r  +  2) +  2)  (12) 

and  whose  expected  sensing  quality  is  at  least 

F(A*)  x  (1  —  e~l)  x  p  (13) 


Proof.  Let  a  cluster  Ci  be  called  occupied  if  A*  PlCj  7^  0;  w.l.o.g.,  let  the  s+1  clusters  Co,  C2,  ...  ,CS 
be  occupied.  Let  zq,...,zs  be  the  cluster  centers  (i.e.,  the  first  nodes  picked  by  the  greedy  al¬ 
gorithm).  We  start  building  T'  by  adding  a  spanning  tree  on  the  centers  of  the  clusters  that  arc 
occupied. 

The  Cost.  Let  us  bound  the  length  of  this  center-spanning  tree.  Since  A*  contains  a  point  (say  af) 
from  each  C%,  the  padding  condition  ensures  that  the  r-balls  Br{af)  must  be  disjoint,  and  hence  the 
length  of  T*  is  at  least  rs.  Now,  to  attach  a,  to  zt,  we  can  add  paths  of  length  at  most  ar  to  T*;  thus 
causing  the  resulting  tree  to  have  length  £*  +  ars  <  (a  +  1)1* .  Since  this  is  a  Steiner  tree  on  the 
centers,  we  can  get  a  spanning  tree  of  at  most  twice  the  cost;  hence  the  cost  of  the  edges  connecting 
the  spanning  centers  is  at  most 

2(a  +  1)  i* .  (14) 

Now  consider  an  occupied  cluster  C, ,  and  let  \A*  H  C,  =  n,  he  the  number  of  padded  nodes  in  C,  . 
We  now  add  to  T  the  subpath  of  Pi  containing  first  n,  nodes  { Zt  =  G,_  \ .  Gr:i, . . . ,  Gl>nt } .  Note 
that  the  length  of  edges  added  for  cluster  C,  is  at  most  amp,  summing  over  all  occupied  clusters 
gives  a  total  length  of  ar  ni  <  ar|7l*|  <  ari* ,  since  each  edge  in  T*  has  at  least  unit  length. 
Adding  this  to  (14)  proves  the  claim  on  the  length  of  T' . 

The  value.  Finally,  let  us  calculate  the  sensing  quality  value  of  the  tree  T'\  by  the  properties  of  the 
greedy  algorithm  used  in  the  construction  of  Q' ,  the  total  weight  of  the  set  Sul/  added  in  cluster  C, 
is  at  least 

(1  -  e~1)F(A*  n  Ci)  (15) 

Summing  this  over  occupied  clusters,  we  get  that  the  total  value  is  at  least  (1  —  e_1)F(7l*),  whose 
expected  value  is  at  least  (1  —  e~1)pF(A*).  □ 
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Proposition  8.  If  the  graph  Q  contains  an  s  —  t  path  V*  spanning  nodes  A*,  of  length  c (V*)  =  i* 
and  value  F(A*),  then  there  is  an  s  —  t  path  V'  of  the  graph  Q'  that  has  length  at  most 

2  x  t  x  (a(r  +  2)  +  6)  (16) 

and  whose  expected  sensing  quality  is  at  least 

F(A*)  x  (1  —  e”1)  x  p  (17) 

Proof.  This  result  is  an  immediate  corollary  from  Proposition  7,  noting  that  the  vertices  s  and  t 
need  to  be  connected  to  the  center  spanning  tree  by  paths  at  most  2£* ,  and  that  the  path  V*  is  a  tree, 
and  hence  there  exists  a  subtree  T  in  Q'  of  length  at  most  £*  x  (of  r  +  2)  +  4).  This  tree  is  readily 
converted  into  a  path  (e.g.,  by  traversal)  of  length  at  most  twice  the  cost  of  the  tree,  i.e.,  of  at  most 

2xf  x  (a(r  +  2)  +  6).  □ 

Combining  these  results,  we  now  prove  a  slightly  more  detailed  statement  of  Theorems  1  and  5 : 


Theorem  9.  Trees:  For  the  covering  problem  (1),  pSPIEL  will  find  a  solution  T,  with  cost  at  most 

KQuotat(a(r  +  2)  +  2)  (18) 

and  whose  expected  sensing  quality  is  at  least 

(1  -e~l)1PF(A*),  (19) 

where  £*  is  the  sensing  quality  of  the  optimum  tree  A*.  For  the  maximization  problem  (2),  pSPIEL 
will  find  a  solution  T  with  cost  at  most 

t  (a(r  +  2)  +  2)  (20) 

and  whose  expected  sensing  quality  is  at  least 

(21) 

where  kquou,:  and  k Budget,  denote  the  approximation  guarantees  for  approximately  solving  Quota- 
and  Budget-MST  problems  (currently,  HQuota  =  2  and  k Budget  =  3  +  £,  for  e  >  0,  are  the  best 
known  such  guarantees  Garg  (2005);  Johnson  et  al.  (2000)). 

Paths:  For  the  path  planning  problem  (10),  pSPIEL  will  find  a  solution  V.  with  cost  at  most 

Korient2£*(a(r  +  2)+6 )  (22) 

and  whose  expected  sensing  quality  is  at  least 

(1  -e~1)1PF(A*),  (23) 

where  £*  is  the  sensing  quality  of  the  optimum  tree  A*.  Hereby,  ttorient  is  the  approximation 
guarantee  for  approximately  solving  the  Orienteering  problem  ( currently,  Korient  =  3  +  e  is  the 
best  known  such  guarantee  ( Chekuri  et  al,  2008)). 
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Proof.  Proposition  7  proves  the  existence  of  a  tree  T'  in  the  graph  Q' ,  for  which  both  cost  and 
total  weight  arc  close  to  the  optimal  tree  T  in  Q.  The  construction  in  the  proof  also  guarantees 
that  the  tree  T'  contains  at  least  one  cluster  center  G^i  for  some  i  (or  is  empty,  in  which  case  T  is 
empty).  Proposition  6  handles  the  transfer  of  the  solution  to  the  original  graph  Q.  Hence,  in  order 
to  solve  the  covering  problem  (1)  or  optimization  problem  (2)  in  Q,  we  need  to  solve  the  respective 
covering  and  maximization  problem  in  the  modular  approximation  graph  Q' ,  rooted  in  one  of  the 
cluster  centers.  Any  HQuota  approximate  algorithm  to  the  Quota-MST  problem  can  be  used  for  the 
covering  problem,  using  a  quota  of  Q  =  (1  —  e~ 1  )p  F{A* ).  While  for  the  unrooted  version  of  the 
Budget-MST  problem,  there  is  a  constant  factor  k Budget  =  3  +  e  approximation  algorithm,  the  best 
known  approximation  for  rooted  Budget-MST  is  4  +  e  by  Chekuri  et  al.  (2008).  We  can  however 
exploit  the  structure  of  the  MAG  to  still  get  an  approximation  guarantee  and  prove  Theorem  1  for 
an  improved  guarantee  of  3  +  e.  We  simply  need  to  prune  all  nodes  in  Q'  which  arc  further  than 
B  =  i*  (a(r  +  2)  +  2)  away  from  the  core  of  Q' ,  and  then  run  the  unrooted  approximation  algorithm 
Johnson  et  al.  (2000)  on  Q' .  If  this  algorithm,  stalled  with  budget  B  =  £*  (a(r+2)+2)  selects  nodes 
from  sub-chain  i,  not  including  center  6',;.  i ,  we  instead  select  the  entire  i-th  chain.  By  construction, 
this  procedure  is  guaranteed  not  to  violate  the  budget,  and  the  submodular  function  value  can  only 
increase. 

The  result  for  paths  follows  analogously,  using  Proposition  8  instead  of  Proposition  7,  and  applying 
the  approximation  algorithm  for  s  —  t  orienteering  to  the  modular  approximation  graph.  □ 


38 


Ml 

MACHINE  LEARNING 
DEPARTMENT 

Carnegie  Mellon  University 
5000  ForbesAvenue 
Pittsburgh,  PA  15213 


Carnegie  Mellon. 

Carnegie  M  ellon  University  does  not  discriminate  and  Carnegie  M  el  I  on  University  is 
required  not  to  discriminate  in  admission,  employment,  or  administration  of  its  programs  or 
activities  on  the  basis  of  race,  color,  national  origin,  sex  or  handicap  in  violation  of  Title  VI 
of  the  Civil  Rights  A  ct  of  1964,  Title  IX  of  the  Educational  Amendments  of  1972  and  Section 
504  of  the  Rehabilitation  Act  of  1973  or  other  federal,  state,  or  local  laws  or  executive  orders. 

In  addition,  Carnegie  M  ellon  University  does  not  discriminate  in  admission,  employment  or 
administration  of  its  programs  on  the  basis  of  religion,  creed,  ancestry,  belief,  age,  veteran 
status,  sexual  orientation  or  in  violation  of  federal,  state,  or  local  laws  or  executive  orders. 
However,  in  the  judgment  of  the  Carnegie  M  ellon  Human  Relations  Commission,  the 
Department  of  Defense  policy  of,  "Don't  ask,  don't  tell,  don't  pursue,"  excludes  openly  gay, 
lesbian  and  bisexual  students  from  receiving  ROTC  scholarships  or  serving  in  the  military. 
Nevertheless,  all  ROTC  classes  at  Carnegie  M  ellon  University  are  available  to  all  students. 

Inquiries  concerning  application  of  these  statements  should  be  directed  to  the  Provost,  Carnegie 
Mellon  University,  5000  ForbesAvenue,  Pittsburgh  PA  15213,  telephone  (412)  268-6684  or  the 
Vice  President  for  Enrollment,  Carnegie  M  ellon  University,  5000  ForbesAvenue,  Pittsburgh  PA 
15213,  telephone  (412)  268-2056 


Obtain  general  information  about  Carnegie  M  ellon  University  by  calling  (412)  268-2000 


